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A  new  Bayesian  procedure  for  factor  analysis  is  developed  in  which 
factor  scores  as  well  as  factor  loadings  and  error  variances  are  treated 
as  parameters  of  interest.  The  presentation  is  fully  Bayesian  in  the 
sense  that  all  the  parameters  have  prior  distributions  and  the  posterior 
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The  model  is  a  standard  one  where  the  observations  are  expresssed 
as  the  sum  of  the  linear  combination  of  factor  scores,  with  factor 
loadings  being  the  weights,  and  a  normal  error  term.  As  the  prior  distri¬ 
bution  the  following  exchangeable  form  is  assumed: 

A  factor  score  vector  for  each  observation  has  a  common  normal 
distribution. 

A  factor  loading  vector  for  each  variable  has  a  common  normal 
distribution. 

A  error  variance  for  each  variable  has  a  common  inverted  chi 
square  distribution. 

When  the  exchangeability  of  all  the  observations/variables  is  in  question 
observations/variables  may  be  divided  into  several  subsets  and  the 
observations/ variables  within  each  subset  may  be  treated  as  exchangeable. 

Since  the  posterior  marginal  distribution  of  factor  loadings  and 
error  variances  can  be  expressed  as  the  product  of  the  covariance-based 
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The  mode  of  the  hyper parameters  is  first  derived  from  their  posterior 
marginal  distributions  and  conditional  on  those  values  the  mode  of  error 
variance  is  derived  from  their  posterior  marginal  distributions.  Then, 
conditional  of  those  estimates,  the  point  estimate  of  factor  scores 
and  factor  loadings  are  derived  as  the  joint  or  the  marginal  mode  of  the 
posterior  distribution  of  factor  scores  and  factor  loadings  depending  on 
the  investigator's  interest. 

The  marginalization  is  done  via  some  variations  of  the  EM  algorithm 
and  it  is  found  that  the  different  variations  result  in  almost  identical 
estimates.  It  is  also  found  that  the  effect  of  the  prior  distribution  of 
error  variances  is  such  that  it  reduces  the  number  of  local  maxima. 
Finally,  by  specifying  a  priori  zeros  in  the  locational  hyper parameters 
of  factor  loadings,  a  simple  structure  can  be  obtained  without  rotation. 
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Bayesian  Factor  Analysis 

CHAPTER  I 
INTRODUCTION 


Factor  analysis  is  a  multivariate  statistical  method  used  to  explain  the 
relationships  among  observed  variables.  Simply  stated,  standard  factor 
analysis  assumes  that  each  observed  variable  is  a  weighted  sum  of  two  sets  of 
random  variables,  namely,  common  faotor  scores  and  unique  scores,  all  of  which 
are  unobservable.  The  purpose  of  the  method  is  to  estimate  the  weights,  or 
factor  loadings  associated  with  each  variable  and  to  estimate  the  factor  scores 
associated  with  each  person.  A  typical  application  of  the  factor  analysis 
method  consists  of  the  calculation  of  a  correlation/dispersion  matrix  of  the 
observed  variables,  which  contains  the  sufficient  statistics  under  the  nsual 
model,  estimation  of  the  weights,  statistical  testing  of  the  model,  and 
interpretation  of  the  derived  latent  variables.  Therefore,  much  of  the 
literature  of  faotor  analysis  is  concerned  with  how  to  estimate  factor 
loadings,  how  to  test  the  model  statistically,  and  how  to  find  a  meaningful 
interpretation  of  those  latent  variables. 

Sometimes,  however,  it  is  desirable  to  go  further  and  to  estimate  the 
values  of  those  latent  variables  associated  with  each  observation.  For 
example,  the  vector  unfolding  model,  whioh  is  often  used  to  analyse  the 
underlying  structure  of  preference  among  a  set  of  stimuli,  has  essentially  the 
same  model,  Carroll ( 1972 ) ,  or  Bechtel(1976) ,  and  the  scale  values  of  each 


stimulus  mst  be  known  in  practical  applications.  In  applications  of 
Spearman's  theory  of  general  intelligence  it  is  the  value  of  'g'  that  is  of 
interest  in  all  applications.  Also,  the  congeneric  test  aodel  in  classical 
test  theory  nses  the  sane  model  as  the  factor  analysis.  Lord  and  Novick  (1968). 
where  the  true  score  is  represented  by  the  factor  soores.  Therefore,  if  the 
individual  true  scores  are  needed  they  must  be  estimated.  (See  Chapter  III  for 
the  detail.)  However,  as  we  shall  see  in  Chapter  II,  these  values  cannot  be 
determined  uniquely  under  the  standard  factor  analytic  model  because  they  are 
not  treated  as  parameters  of  the  model  but  rests in  as  random  variables  even 
after  the  factor  loadings  are  estisiated.  This  general  problem  is  known  as  the 
indeterminacy  of  factor  scores,  which,  was  probably  first  pointed  out  by  Wilson 
(1928)  and  later  elaborated  by  Guttman  (1955).  In  this  sense  the  standard 
factor  analytic  model  was  called  the  random  factor  analytic  (RFA)  model  by 
McDonald  (1979)  and  Anderson  (1984). 

The  developawnt  of  a  model  which  enables  us  to  estiamte  the  values  of 
those  hypothetical  concepts,  namely,  the  factor  soores,  is  not  new.  For 
example,  Lawley  (1942),  Whittle(1952) ,  Anderson  and  Rnbin(1956), 
Joereskog(1963),  McDonald  and  Bnrr(1967),  and  McDonald( 1979b ) ,  have  considered 
this  problem,  although  usual  textbooks  do  not  discuss  this  model  in 
detail, (see,  for  example,  Harman's (1976,  sec. 2. 3)  treatment.)  However,  none  of 
those  methods  were  successful  in  the  sense  of  providing  unique  estimates  of 
factor  scores.  As  we  shall  see  later,  this  is  due  to  the  fact  that  the 
likelihood  function  of  the  fixed  factor  analytic  (FFA)  model,  in  which  the 
factor  scores  are  treated  as  parameters,  is  unbounded  above,  which  implies 
nonexistence  of  maximum  likelihood  estimates. 


The  purpose  of  this  thesis  is  to  develop  a  method  that  enables  the 
estimation  of  factor  scores  as  parameters.  Dae  to  the  natare  of  the  problem 
the  treatment  is  based  on  Bayesian  fixed  factor  analysis.  In  Chapter  II  we 
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first  provide  a  brief  review  of  the  random  factor  analytic  model  its 
classical  and  Bayesian  estimation  procedures.  Then,  in  Chapter  III,  the 
classical  fixed  factor  analytie  model  is  introduced  and  its  new  Bayesian 
treatment  is  discussed  in  Chapter  IV.  It  is  shown  that  the  method  proposed  is 
sore  general  in  the  sense  that  both  the  RFA  and  the  FFA  models  are  included  as 
special  cases.  In  Chapter  V,  an  evaluation  of  the  new  method  is  presented 
based  on  some  real  and  artificial  data  sets. 
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REVIEW  OF  THE  RANDOM  FACTOR  ANALYTIC  (RFA)  IJODEL 


Model 


Factor  analysis  is  one  of  several  multivariate  statistical  methods 
studying  the  underlying  relationships  between  observed  variables.  It  assusws 
that  each  observed  variable,  y^,  j«l,2,...,p,  can  be  represented  as  a  sum  of 

three  components: 


(2.1.1)  Xj  -  .j  ♦  X'l1  Vj.  1  *  v 


where  the  m  is  the  overall  swan  of  variable  j,  the  f  ,  e*l,2,...,r, 

J  9 

r  <  p,  are  latent  (unobserved)  variables  called  coosnon  factors,  the  a.  , 

J* 

e=l,2,...,r,  are  the  weights  (called  factor  loadings)  that  link  the  e**1 
factor  to  variable  j,  and  the  u  are  other  latent  variables  called  unique 

J 

factors  for  variable  j.  The  nusiber  of  latent  common  factors,  r,  is  usually 
referred  to  the  nusiber  of  dimensions.  Arranging  p  observed  variables  together 

(2.1.1)  can  be  written  as 

(2.1.2)  i  ■  m  +  Af  +  u. 


*J'  -  I  *.  1,  P  x  r 
P  j« 
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A  *  (n^a^ 


f  -  Ifj.fj . frJ*.  f  i  1, 

-  “  ^Ul'U2'*,"°p*  *'  P  x  !• 

Using  E,  D,  and  C  to  represent  expectation,  dispersion  and  covarianoe 

operators,  respectively,  the  following  specifications  are  typically  Bade: 

(2.1.3)  E(f)  -  0 

(2.1.4)  D(f)  - 

where  is  the  r  x  r  factor  correlation  aatrix 

with  diag(Cp)  ■  1^, 

(2.1.5)  E(u)  -  0. 


(2.1.6)  D(n)  -  D, 


where  D  is  the  p  x  p  diagonal  swtrix  consisting  of  dj's. 


(2.1.7)  C(f,n)  -  0,  (zero). 


We  call  this  aodel,  following  NcDona Id ( 1979b ) ,  the  Random  Factor  Analysis  (RFA) 
model,  in  the  sense  that  f  ia  treated  as  a  random  variable. 


Under  the  RFA  model  we  can  deduce  the  following: 

(2.1.8)  E(X)  -  m. 


(2.1.9)  D(y)  -  Q  -ACpA'  +  D, 


C(j,f)  -  ACp, 

(2.1.10)  E(jlf)  -  m  +  Af , 
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(2.1.11)  D(y|f)  =  D. 

The  last  relation,  (2.1.11),  is  sometimes  called  'partial  linear  independence', 
Joe re slog  and  Soerbom(1979,  ch.l),  or  'weak  local  independence', 

McDonald(1979a) .  The  implication  of  (2.1.11)  is  that,  given  f,  the  unique 

scores  are  not  correlated  with  each  other  and  that  with  respect  to  f  the 

conditional  variances  of  the  unique  scores  are  homoscedastic  within  each 
variable.  It  should  be  noted  that  the  nuaiber  of  dimensions,  r,  is  defined  by 

(2.1.11) .  That  is,  r  is  the  minimum  nuaiber  of  common  factors  such  that  the 
conditional  dispersion  autrix  of  the  observed  variables  given  the  common 
factors  is  diagonal.  The  only  substantive  assuaiptions  of  the  KFA  model  are 
that  r  <  p  and  that  the  variables  are  conditionally  independent  and 
homoscedastic  (2.1.11).  Beyond  that  the  sx>del  is  siaiply  a  decomposition,  cf. 
Lord  and  Novick,  1968,  Ch.  24. 

It  can  be  shown  that  if  the  original  variables  are  rescaled  by  p  x  p 
diagonal  matrix  V  and  a  p  x  1  vector  v  so  that 

(2.1.12)  y*  =  V(y  -  v) . 

then  the  resulting  variables  have  the  moan  and  dispersion, 
respectively, 

(2.1.13)  E(jr*)  =  V(m  -  v). 

d(2*)  -  vnv. 

Therefore,  the  change  of  original  scale  results  in  the  corresponding 
rescaling  of  the  mean,  factor  loadings  and  unique  variance,  nasiely, 

a  «  V(m-v) ,  A  3  VA,  and  D  *  VDV. 


When  we  estimate  parameters  from  the  observations,  whether  this  property  holds 
among  the  estimates  depends  on  what  method  of  estimation  is  used.  It  is  known 
that  maximum  likelihood  esthaates  and  Bayesian  mean,  median,  and  modal 
estimates  have  this  property.  Some  other  estimation  procedures  do  not  have 
this  property. 

It  is  well  known  that  the  model  (2.1.2)  through  (2.1.7)  is  not  unique. 
Consider  the  transformation 

(2.1.14)  f*  =  T'f , 

where  T  is  the  r  z  r  nonsingular  matrix  with  diag(T'CpT)*Ir. 

With  this  new  latent  variable  we  can  rewrite  the  model  as 

(2.1.15)  y  =  m  +  Bf*  +  u. 

where  B  -  A  (T')_1, 

which  has  the  same  form  as  the  original  model. 

With  this  new  ^arametrization,  we  have 

<2.1.16)  El  f*  ]  =  0. 

(2.1.17)  D[  f*  ]  -  T'CpT, 

(2.1.18)  D[  y  1  -  B  T'CpT  B'  +  D, 
and 

(2.1.19)  Cl  jr.f*  ]  -  B  T'CpT  -  ACp  T. 

This  implies  that,  given  one  set  of  factors,  we  can  always  transform  them  into 
a  desired  form  by  appropriately  choosing  the  transformation  matrix  T.  This  is 
known  as  rotational  indeterminacy.  Por  example,  if  the  T  matrix  has  the  form 
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L4  Martin  and  McDonald's  prior  on  D 

Combining  the  prior  distributions  of  A  and  in  LI  through 

L3  with  the  prior  distribution  of  D  of  the  form 

the  density  of  d^  is  proportional  to  Exp[(-l/2)Vj/d^] , 

j*l,2,. ...p,  independently, 
where  v^'s  are  prior  constants, 

we  have  L41,  L42,  and  L43 ,  say,  respectively. 


For  those  elements  of  A  of  whioh  we  have  strong  prior 
information, 

a  :  N(  a*  ,  G  ), 

where  a  is  the  column  roll-out  of  those  elements  of  A. 


For  the  rest  of  the  elements  of  A,  of  which  the  information 
is  vague,  a  locally  uniform  prior  is  assumed. 

For  each  element  of  D,  independently. 


the  density  of  proportional  to  1/d^ , 


is  used. 


e 

a  ,  G,  and  h  are  the  prior  constants. 
A  and  D  are  mutually  independent. 


A,  Cp,  and  D  are  notoally  independent 


L2  Non  exchangeable  factor  loadings 


For  those  elenents  of  a^'s  001  fixed,  independently 


,2  •  2  . 

a.  Is.  :  N(  a  .  .  s.  ), 

je  je  je  je 


d  w.  /a2  :  x2(  d  ), 

je  Je  je  je 


For  Cp  and  D  the  same  priors  as  LI  is  used. 


e 

a  ,  d  ,  v  ,  for  those  elenents  of  A  not  fixed, 
je  je  je 

are  the  prior  constants. 

A,  Cp,  and  D  are  aatxutlly  independent. 


L3  Noninformative  prior 

For  those  elenents  of  A  not  fixed, 

Sj^  locally  unifom. 

For  C^,  hierarchically, 

C^lR  :  Wr(  R  .  g  ), 

density  of  R  is  proportional  to  Isl  ^r+2^2 


g  is  the  prior  constant 


2.  Likelihood  f one t Ion 


LI  through  L4  and  PR  Sample  dispersion  matrix  it  assumed  to  have 

the  Vishart  density  in  (2.2.9).  The  likelihood 
in  (2.2.10)  is  nsed. 

KP  Sane  as  above  with  nonzero  off  diagonal  element  of  D. 

WO  The  data  matrix  T  is  assumed  to  have  the  Normal 

density  in  (2.2.12).  the  likelihood  will  be  need. 


3.  Prior  dlstribntion 

LI  Exchangeable  factor  loadings 

For  those  a^'a  which  are  not  fixed,  hierarchically, 

a  la  , s  :  N(  a  ,  s  ),  i.i.d., 

e 

a  locally  uniform, 
wd  It"  X*(d). 

For  Cp. 

C^1  :  Wr(  R  ,  g  ). 

For  each  element  of  D,  independently, 
hjVj/dj  :  X*(  hj  ). 

d,  w,  R,  g,  and  v  'a,  and  h  *s  are  prior 
J  J 


constants. 
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S.  Evaluate  the  posterior  distribution  in  order  to  find 
som  values  of  the  parameters  which  best  represent  the 
posterior  distribution,  (location  and  dispersion.) 

In  tbs  following  section  a  step  by  step  comparison  of  seven  existing  Bayesian 
methods  will  be  presented.  These  methods  are: 

(LI)  Lee's(1981)  ease  1. 

(L2)  Lee 's( 1981)  ease  2. 

(L3)  Lee '»( 1981)  oase  3. 

(L4)  Lee  ’s(1981)  oase  4. 

(PR)  Press' (1982). 

(KP)  Kaufmann  and  Press *(1973) . 

(10)  Wong’s(1980). 

(Each  method  will  be  denoted  by  the  abbreviation  presented 
in  the  parenthesis.) 

Beoause  Martin  and  MoDonald's(1973)  method  can  be  treated  as  a  special  oase  of 
Lee's  method,  it  will  not  be  considered  explicitly. 


Stepwise  Comparison  of  Bayesian  Methods 
1.  Parameters  of  interest 

LI  through  L4  A,  Cp,  and  D  with  som  of  the  elements  of  A  fixed. 
PR  A  and  D, 

IP  A  and  D,  with  D  not  restricted  to  be  s  diagonal, 

r  not  restricted  to  be  less  than  p. 

10  m.  A,  and  D. 


approximation  to  the  A  distribution. 

To  test  the  hypothesis  with  some  of  the  elements  of  A  or  Cp  being  fixed, 

say.  to  zero,  a  slightly  different  procedure  is  esiployed.  This  approach  is 
called  confirmatory  factor  analysis  or  the  restricted  model,  (Joereskog 
(1969)].  Treating  Cp  as  a  paraawter  in  the  model,  if  the  pattern  of  the 

values  fixed  is  such  that  it  enables  the  unique  maximum  of  the  likelihood,  the 
minimization  is  performed  without  restrictions  (2.2.15)  and  (2.2.16)  with 
respect  to  D  and  those  elements  of  A  and  Cp  which  are  not  fixed.  If  this  is 

not  the  case  the  minimization  will  be  performed  subject  to  soaw  additional 
condition  which  guarantees  a  unique  solution.  In  either  case,  soaw  adjustment 
of  the  degrees  of  freedom  is  necessary.  Essentially,  d.f.  is  equal  to 
(2.3.4)  p(p+l)/2  -  number  of  free  parameters  to  be  estimated 

+  number  of  independent  restrictions  on  the  paraawters. 

For  further  details,  see  Joereskog(1969) . 


The  Review  of  Bayesian  Estiamtlon  Methods  in  UFA  Model 

In  this  section  a  review  of  existing  Bayesian  estimation  methods  in  the 
UFA  model  is  presented.  In  general,  Bayesian  estimation  proceeds  as  follows: 

1.  Identify  the  paraawters  of  interest. 

2.  Specify  the  conditional  distribution  of  data  given 

parameters,  namely,  the  likelihood  function. 


3.  Specify  the  form  of  the  prior  distribution  of  the  parameters. 

4.  Find  the  posterior  distribution  of  the  parameters  given  the  data. 


for  the  resulting  normal  equations.  Sines  the  grand  naan  m  is  treated  as  known 
in  their  method,  the  result  is  the  sane  as  the  MLE  based  on  the  L^ 
likelihood. 

Goodness  of  Pit  Test  in  UFA  Model 

Vith  large  N,  it  is  known,  Joereskog  (1967),  that  minus  twice  the  log 
likelihood  ratio 

(2.3.1)  LLR  -  -2(  (  L^.D^S)  ) 

-  ln(  |sf  (N_1>/2Exp[-(N-l)/2)tr(s"1S)]  ) 

-  (N-l)(  F1(A+.D+)  -  ln(lsl)  -  p  ) 
is  distributed  as 

LLR  :  X2(  (  (p-r)2  -  (p+r)  )/2  ). 

The  inside  of  the  seoond  logarithm  term  is  the  ▼sine  of  evaluated  at 

Q  -  (N/(N-1))S  .  Therefore,  the  hypothesis 

(2.3.2)  Hr  :  Q  -  AC^A'  +  D  with  specified  r, 

ean  be  tested  against  the  alternative  hypothesis 

(2.3.3)  Hq  :  0  is  positive  definite  synawtrie, 

with  approximate  signifieanee  level  alpha,  by  coopering  the  value  of  LLR  to  the 

(1-alpha)  percentile  point  of  the  X2  distribution,  i.e.,  if  LLR  exeeeds  the 
percentage  point,  will  be  rejected.  Bartlet(1951)  has  suggested  that  the 

use  of  N-l-(2p+5)/N-2r/3  in  place  of  N-l  in  (2.3.1)  in  order  to  obtain  a  better 
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matrix,  thus,  the  Q  matrix,  most  be  calculated  by  (2.2.23)  each  time  D  ia 
updated  tinea  wo  are  treating  F ^  at  the  function  of  D  only.  The  algorithm 

nanally  converge*  rapidly  to  a  local  minimum.  It  it  known,  however,  that  the 
minimum  often  liet  on  the  region  where  tome  of  the  error  variance*  are  zero, 
i.e.,  a  Haywood  cate. 

Another  approaoh  to  the  ML  estimation  of  the  RFA  model  it  found  in  Rubin 
and  Thayer  (1982)  where  the  EM  algorithm  ia  used.  (  See  Chapter  IV  for  the 
explanation  of  the  EM  algorithm.  )  Their  solution  it  the  iteration  of  the 
following  two  steps: 

The  E-step: 

(2.2.26)  W  -  (  D  +  ACpA'  )-1  A<^. 
and 

(2.2.27)  Q-lf'SItV  I"1, 
where 

(2.2.28)  V  -  -  (^(DMC^A')"^. 


The  M-step: 

(2.2.28)  A  -  S  W  Q, 

and 

(2.2.29)  D  -  diag(  S  -  SHOTS  ). 

The  method  it  derived  by  writing  the  Lj  likelihood  discussed  in  Chapter  III 

in  terms  of  the  sufficient  statistics,  S,  (1/N)Y*F,  and  (1/N)F*F,  replacing  the 
corresponding  quantities  by  their  conditional  expectations  given  Y,  namely,  S, 

Si,  and  Q  *,  differentiating  the  result  with  respect  to  A  and  D,  and  solving 


should  b«  taken.  Also,  from  (2.2.18)  vs  have,  given  A, 

(2.2.24)  D  -  diag(  S  -  ACpA'  ). 

Therefore,  the  iteration  of  (2.2.23)  and  (2.2.24)  gives  the  desired  minimum  of 
when  the  process  converges.  However,  this  method  given  in  Lawley  and 

Maxwell  (1963)  is  known  to  be  very  slow  to  converge.  Therefore,  the  following 
method  is  nsnslly  used  [Lawley  and  Maxwell  (1971)  or  Joereskog  (1967)1. 
Noticing  the  fact  that  the  conditional  minima  of  Fg  with  respect  to  A  given 

D  is  obtained  analytically  by  (2.2.23)  we  may  consider  F^  as  a  function  of  D 

only.  That  is,  the  derivative  of  F^  with  respect  to  D  can  be  evaluated  as 

(2.2.25)  d  F2/d  dj  -  dFj/ddj  +  tr(  OF^A]  'OA/dd^l ), 

j  -  1,2,....  p. 

However,  the  second  tern  vanishes  at  the  point  where  (2.2.23)  is  satisfied 
since  3F2/d  A  is  zero  if  the  A  matrix  given  in  (2.2.23)  is  used  to  evaluate 

it.  Therefore,  the  derivative  of  F2  with  respect  to  D,  when  it  is  regarded 

as  a  function  of  D  only,  is  given  by  (2.2.18)  with  the  A  matrix,  thus  the  0 
matrix,  defined  by  (2.2.23).  Given  the  derivatives,  the  minimization  of  F2 

can  be  performed  via  several  existing  numerical  methods  which  do  not  require 
the  inforaiation  provided  by  the  second  derivative  of  F2  with  respect  to  D. 

For  example,  the  Fletcher-Powell  method,  which  is  advocated  by  Joereskog 
(1967),  determines  the  direction  of  search  for  the  minimus  by  approximating  the 
inverse  of  the  second  derivative  matrix  using  the  information  provided  by  the 
first  derivatives  only.  It  sould  be  noted  that  when  the  evaluation  of  F2  is 

necessary,  say,  in  the  cubic  search  along  the  direction  determined,  the  A 


to lotion  based  on  is  ss  follows.  The  partial  derivatives  of  ?2  with 
respeet  to  A  and  D  are: 

(2.2.17)  3F2/3A  -  2(  Q*1A-{f1sf1A  ). 
and 

(2.2.18)  ap2/ao  -  diag(  or1-®**”1  ). 

(  See  the  Appendix  for  matrix  differentiation.  ) 

By  setting  (2.2.17)  equal  to  zero  wo  have 

(2.2.19)  SD_1A  -  A. 

But,  by  Lawley't  triek  in  the  Appendix,  it  can  be  reexpressed  as 

(2.2.20)  SD_1A(I  +A'D_1A)_1  -  A, 

r 

thus, 

(2.2.21)  SB-1  A  -  A  (Ir+A'D_1A), 
or. 

(2.2.22)  D~1/2SD_1/2  d“1/2A  -  D_1/2A  (I^A'D^A) . 

This  form  shows,  with  the  restriction  (2.2.15),  that  (I^+A'D  *A)  is  the 
-1/2  -1/2  —1/2 

eigen  values  of  0  SD  and  that  D  A  is  the  assoeiated  eigen 
vectors.  That  is,  given  D,  the  A  matrix  which  minimises  P2  csn  be  written  as 

(2.2.23)  A  -  D1/2Q(L+Ir)1/2, 

-1/2  -1/2 

where  L  is  the  eigen  value  matrix  of  D  SD  and  Q  is  the 
assoeiated  orthonormal  eigen  vectors. 

It  can  be  shown  that  in  order  to  minimise  P,  the  r  largest  eigen  valoes 


*  ExpK-mHrUY-l^')  ‘(Y-l^')')) 

-  lfi|(_W2)  *  B*p[(-N/2)tr(0_1S)] 

*  Exp[  (-N/2)  (y  -m)'!)"1^  -m)J, 

•  e 

where  j  -  (l/N)^  1. 

Notioing  that  regardless  of  the  value  of  Q,  is  maximized  at 

(2.2.13)  m+  -  j  , 

e 

the  H£  of  A  and  D  can  be  found  by  sin ini zing  the  nono tone  decreasing  function 
of  Lj. 

(2.2.14)  F2(A.D)  -  lnlQl  +  tr(Q  -1S). 

Because  is  not  a  Monotone  function  of  the  resulting  solution,  when  N 
is  not  so  large,  is  different. 

In  both  methods,  in  order  to  remove  the  rotational  indeterminacy  the 
restrictions 

(2.2.15)  A'D_1A  is  diagonal, 
and 

(2.2.16)  Cp  -  Ir, 

are  enforced.  If  the  factors  are  aasuawd  to  be  correlated,  the  correlated 
factor  will  be  found  by  finding  the  transformation  matrix  P  described  in 
(2.1.14)  through  (2.1.20). 

Several  numerical  methods  are  available,  for  example,  see 
Lawley  and  Maxwell (1963, 1971)  or  Joeresfcog(1967,1977) .  The  outline  of  tlm 


(2.2.6)  S  -  (1/N)Y'JY 


where 

(2.2.7)  Y  ”  '•  N  *  P» 

•ad 

(2.2.8)  J  -  IN  -  (l/N)^*, 

has  the  Vishart  distribution.  Press(1972), 

(2.2.9)  N  *  S  :  Vp(  ACpA*  +  D  .  N-l  ), 

whara  V  (  C  .  df  )  denotes  tha  Vishart  distribution  with 
P 

the  expectation  df  x  C  and  tha  degrees  of  freedom,  df . 

The  likelihood  of  A  and  D  is  proportional  to 

(2.2.10)  L^A.DlS)  —  |ar(N"1)/2Exp[-(N/2)tr(0_1S)], 
where  the  synbol  “  denotes  the  proportionality. 

Therefore,  in  this  formulation,  the  statistical  estimation  of  the  original  UFA 
parameters  reduces  to  the  estimation  of  the  covariance  structure  shown  in 
(2.1.9)  under  the  Vishart  probability  model.  (Usually,  m  is  estimated  by  the 

sample  overall  mean.)  The  maxima  likelihood  estimates  (MLS)  of  A  and  0  are 
found  by  minimizing  a  monotone  decreasing  function  of 

(2.2.11)  F1(A,D)  -  lnlQl  +  (N/(N-l))tr(G_1S). 

Although  this  is  the  most  commonly  used  1L  solution  there  is  another  H 
estimation  procedure  based  on  the  original  density  of  Y  described  in 
Anderson  and  Bnbin(19S6)  and  Mardia,  at.,  al.(1979).  The  likelihood  of  m.  A, 

and  D  given  Y  can  be  written  as 

(2.2.12)  L2(m,A,D|Y)  —  |ol<"N/2) 


where  fi  it  defined  in  (2.1.9), 

N  (  ■  ,  Q  )  denotes  the  p-variate  normal  distribution 
P  ~ 

with  the  mean  vector  m  and  the  dispersion  matrix  Q, 

and  the  symbol  is  nsed  to  denote  the  distributional  lav. 

By  applying  Rao's  (1973)  formulae  8a. 3(1)  and  8a.2.9 

it  can  be  shown  that  if  we  replaoe  (2.1.7)  by  its  stronger  form, 

(2.2.2)  f  and  u  are  statist ieally  independent, 

then  (2.2.1),  together  with  (2.1.2)  through  (2 .1.6),  imply, 

(2.2.3)  f  :  Nr(0,(^) 

and 

(2.2.4)  u  :  N  (0,D) . 

“  P  ~ 

Therefore 

(2.2.5)  y  I  f  :  N  (m*Af,D). 

p  -  - 

Because  of  normality,  (2.2.5)  implies  the  local  independence  of  g  given  f 

in  the  sense  of  Lord  and  Novick(1968,ch.24) .  However,  it  is  not  true  that  the 
normality  of  g  end  the  weak  looal  independence  in  (2.1,11)  imply  the  strong 

looal  independence  in  (2.2.5)  and  the  independence  of  f  and  u  in  (2.2.2). 

Conversely,  in  order  to  derive  the  normality  of  g  in  (2.2.1),  we  can  start  with 

(2.1.2) ,  the  independence  of  f  and  u  in  (2.2.2),  and  their  normality  in 

(2.2.3) ,  and  (2.2.4). 

Having  observed  N  independent  observations,  g^,  i-l,2,...,N,  it  follows 
that  the  sample  dispersion  matrix 
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rand on,  and,  therefore,  cannot  be  determined.  Some  interesting  discussions  of 
this  point  frost  a  Bayesian  point  of  view  are  found  in  Bartholomew  (1981)  and 
Mardia,  Kent,  and  Bibby  (1979). 

Historically,  the  HFA  model  is  stated  in  terms  of  the  correlation  matrix, 
~l/2 

that  is,  nsing  [diag(Q)]  and  a  as  the  scaling  constants  V  and  v, 

respectively,  in  (2.1.12),  (2.1.9)  can  be  written  as 

(2.1.22)  D(y* )  -  diag(0)“1/2  0  diag(Q)"1/2  -  R.  say. 

where  R  is  the  population  correlation  matrix  of  the  observed  variables. 

However,  instead  of  the  population  mean  and  standard  deviation,  whieh  are  not 
observable,  if  we  use  the  saag>le  analogues  of  those  (fuantities  as  the  scaling 

constants,  (2.1.22)  is  no  longer  eorrect  since  j  is  not  a  linear 

transformation  of  jr.  Similarly,  if  we  treat  those  estimates  as  fixed  scaling 

constants,  (2.1.22)  is  not  eorrect  either  since  in  this  case  R  is  not  the 
population  correlation  matrix  but  siaqply  a  rescaled  covariance  smtrix. 

Therefore  throughout  this  paper  we  do  not  refer  to  the  correlation  matrix. 


Maximum  Likelihood  Estimation  in  KFA 

In  this  section  we  briefly  review  the  Maximum  Likelihood  (ML)  estimation 
procedure  of  the  KFA  parameters,  namely  m.  A,  and  D.  In  order  to  perform  ML 

estimation  it  is  necessary  to  introduce  the  following  full  distributional 
assumption: 


(2.2.1)  v  :  N  (m  .  0) 
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(2.1.20)  T  -  P  if 1/2  Q  P*. 
where  (^  ■  P  1  P'  is  the  normalized  e 

end  Q  is  any  r  x  x  ortho no  nasi  natrix, 

the  new  variable  f  has  the  ssae  correlation  natrix  C^.  If  the  T  natrix 

has  the  fora 

(2.1.21)  T  -  P  M-1/2  Q. 

the  new  variables  become  orthogonal.  i.,e.,  D[  f  )  ■  1^. 

It  should  be  also  noted  that  under  the  RFA  model  f  is  treated  as  a  latent 

random  variable  and  therefore  cannot  be  estimated  in  the  nsual  statistical 
sense,  i.e..  after  m,  A,  and  D  have  been  estimated  f  and  n  still  remain  as 

random  quantities.  The  so  called  'problem  of  tnctor  score  indeterminacy*  stems 
partly  from  this  fact.  Usually,  in  order  to  'estimate*  the  value  of  f 

associated  with  each  object,  i“l,2, .. . ,N,  some  arbitrary  least  squares 
criterion  is  introduced  and  the  value  which  minimizes  the  criterion,  given  the 
estimate  of  A  and  D,  is  sought.  Without  these  additional  criteria  it  is  known, 
Onttman(1955) ,  that,  as  a  linear  combination  of  g,  the  factor  scores  cannot  be 

uniquely  determined  even  if  we  have  estimated  A  and  D  uniquely.  (See  the 
Appendix  for  the  derivation  of  the  Qottman/Ketselman  formula.)  That  is,  given 
A  and  D,  we  can  construct  an  infinite  number  of  sets  of  f  and  u  which  satisfy 

the  model  (2.1.2)  through  (2.1.7).  Shiba  (1969)  has  identified  at  least 
sixteen  methods  with  different  oriteria.  Simply  stated,  given  the  particular  A 
matrix  and  conditioning  on  the  observed  value  of  j,  f  and  u  in  (2.1.2)  remain 


igen  decomposition  of  Cp, 


Ia  this  s»del,  instead  of  using  the  (A,D)  psriawtrizition, 
the  prior  density  of  (A,Q)  is  assuasd  to  be  proportional  to 
f(A,Q>  —  Bap[(-l/2)tr(A-A2)V2(A-A2)'0)l 

x  |or(h+p+1)/2E*p[(-l/2)tr(&_1Q<A))]. 
where  Q(A)  -  SQ  +  (A-A^V^ArAj) * , 

and  h,  Vj,  V^,  A^,  A^,  and  Sq  are  the  prior 

constants. 

This  fora  of  density  implies  that  given  Q,  A  is  a  truncated 
matrix  normal  distribution,  and  that  given  A,  0  is  a 
truncated  Viabart  distribution.  The  truncations  are  done 
so  that  D  -  Q  -  AA*  is  positive  definite. 

For  a, 

a  :  N  (  0  ,  0  ),  0_1  ->  0  (zero). 

“  p  —  in  n 

For  each  element  of  A,  independently, 

2 

a  N(  0  ,  s  ),  ... ,r. 

je  e 

Because  the  main  purpose  of  Wong's  study  is  the  marginal 
maximua  likelihood  estimation  of  D,  no  prior  distribution 
of  0  is  used. 

a  and  A  are  mutually  independent. 
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s  '#  are  the  prior  constants. 
0 


Wong  also  proposed  a  prior  of  the  fol loving  form: 
m  same  as  before. 

For  each  column  element  of  A,  independently  and  hierarchically, 

a  la  N  (  a  1  ,  s  i  ), 

-e  e  p  e-p  e  p 

a  :  N(  0  ,  %"m  ),  a2  — >  infinite. 

e  *e  *e 

Empirioal  Bayes  estimation  of  the  prior  constants,  namely. 


a  ,  e«l,2,...,r,  is  discussed  in  his  paper  briefly. 
9 


4.  Posterior  distributions 


•  2 

After  integrating  a  and  s  out, 
f (A,D,Cp|S)  —  Lj(A,D) 

_ .  r  ..  ,2  .  .  ,-(nA+d-l)/2  . 

x  T^i  l  (a  -a  )  -HJw/n.J  J 

ja  e  e  A 

x  |Cpr(|“r"1)/2Bxp[(-l/2)tr(BCp1)] 

*  T?  I  d -(^42,/2  ] 
x  Exp[(-l/2)^j^1t  hjVj/dj]  ], 


where  denotes  the  product  of  those  n^  elements  of  A 


which  arc  not  fixed 
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f(A.D.C^|S)  —  LjCA.D) 


x  (  the  last  three  factors  of  Li  ). 


L3 

f  (A.D.CpIS)  —  L^A.D) 

x  |CFI^1,/2 

x  (  the  last  two  factors  of  LI  ) 


LAI,  L42,  and  M3 

f(A,D,CplS)  of  these  esses  are  the  aaae  as  the  ones  in 
LI.  L2,  and  L3,  respeotiTcly,  with  the  last  two  factors 
replaced  by  '|^j_1lExp[<-l/2)Vj/dj]  1. 


PR 

f(A.DlS)  —  L^A.D) 

x  loP^Expl (-1/2) (a-a*)  'G(a-*a*)] . 


f(A.D|S)  —  LJ(A,D)f (A,Q) , 
where  f(A,0)  is  defined  in  the  above. 


where  i  is  the  oolumn  roll-out  of  A, 


S'1  -  [Vj  x  S"1]  +  CV2  x  G], 

S  -  Otf<N-l)S/<h+N-l>, 

• 

a+  -  SjKVj  x  s"1)^  +  (V2  x  G)x21, 

•j  and  «2  are  the  column  roll  out  of 

and  Aj,  respectively, 

and  x  denote*  the  Kronecker  product. 

Wong(1980)  does  not  folio*  the  usual  Bayesian  approach.  Instead,  he  suggests 
the  Maximization  of 

L(D|Y)  -  f  f(A|D.Y)  dA. 

which  is  the  marginal  likelihood  of  D.  Be  suggests  the  use  of  the  EM  algorithm 
vith  numerical  integration  with  respect  to  A  above.  For  another  set  of  the 
priors  he  also  suggests  the  same  algorithm  to  perform  the  hyperparameter 
estimation. 


Discussion 

As  for  the  choice  of  the  likelihood  all  the  authors  but  long  started  vith 
the  sample  dispersion  matrix.  This  is  due  to  the  fact  that  m  is  of  no  interest 

in  typical  applications.  However,  as  shown  in  the  posterior  distribution  of 
Wong's  method,  the  posterior  distribution  based  on  the  likelihood  of  S,  namely, 
L^,  oan  be  obtained  as  a  marginal  posterior  distribution  of  the  1^  based 

posterior  distribution.  In  the  limiting  case  when  0  1  — >  zero,  it  can  be 

B 


shown  that  with  the  normal  prior  of  m,  the  posterior  marginal  of  A  and  D  (  and 

C^)  is  the  product  of  the  Vishart  kernel  and  the  prior  of  A  and  D  (,  and 

Cp).  Therefore,  the  net hod  based  on  seeas  mo re  general.  On  the 

contrary,  the  grand  naan  is  so  well  estiaated  by  the  sample  swan,  it  aay  be 
better  to  exclude  a  froa  the  set  of  paraaeters  of  interest  in  order  to  Bake  the 

resulting  posterior  distribution  siaple. 

The  assuaption  employed  in  Kaufman  and  Press(1973)  that  the  off  diagonal 
elesmnts  of  D  are  not  xero  is  a  questionable  one.  It  clearly  oontradicts  the 
usual  assumption  of  weak  local  independence,  (2.1.11),  or  its  strong  fora, 
(2.2.5).  However,  as  Kaufaann  and  Press  suggest,  it  accounts  for  the 
possibility  of  specification  error,  that  is,  if  the  posterior  distribution  of  D 
is  not  concentrated  on  the  diagonal  matrix  then  it  iaiplies  that  the  number  of 
dimensions  specified,  r,  is  too  saall  to  explain  the  covariance  structure  of 
the  fora  in  (2.1.9).  Therefore,  as  long  as  we  can  evaluate  the  posterior 
distribution  of  D,  the  nonzero  off-diagonal  assuaption  of  D  seeas  to  provide 
useful  information  for  determining  the  number  of  dimensions. 

As  for  the  form  of  prior  distribution,  the  most  interesting  contrast 
exists  between  Lee's  treatment  of  A  and  that  of  Press.  That  is,  while  Lee 
as suae s  those  eleaents  of  A  for  which  we  have  strong  information  to  be  fixed. 
Press  places  strong  prior  on  those  elements.  Taking  the  specification  error 
problem  into  account.  Press'  approach  seeas  superior. 

Finally,  when  the  posterior  distribution  is  coaplex,  there  always  exists  a 
problea  of  the  choice  between  the  exact  aodal  estimate  and  the  approximation  of 
the  posterior  distribution  by  some  evalua table  density  such  as  the  nultivariste 


normal  distribution.  Although  the  exact  evaluation  of  the  (joint)  node 
provides  an  exact  value  of  one  indicator  of  the  location  of  the  posterior 
distribution,  it  does  not  provide  the  dispersion  information  at  all.  On  the 
other  hand,  if  we  approximate  the  posterior  distribution  we  can  have  both 
location  and  dispersion  information  with  less  accuracy  in  the  sense  that  those 
are  the  approximations.  In  face  of  the  complexity  of  the  posterior 
distribution  this  is  an  open  question. 


REVIEW  OF  THE  CLASSICAL  FIXED  FACTOR  ANALYTIC  (FFA)  MODEL 


Modal 

As  stated  bafora  the  common  factors  f  ara  treated  as  random  latent 

variables  in  the  RFA  model.  This  is  doe  to  the  fact  that  the  RFA  model  was 
developed  in  close  relation  to  the  classical  test  theory  model  in  which  each 
subject  is  often  treated  as  a  random  observation  and  the  a a sumption  of 
normality  is  to  some  extent  reasonable.  However,  there  are  some  areas  in  whioh 
it  is  impossible  to  have  a  random  saaple  of  subjects,  yet  the  model  (2.1.1) 
seems  reasonable.  Also,  the  value  of  f  associated  with  each  object,  whioh  is 

by  definition  impossible  to  estimate  in  the  usual  statistical  sense  under  the 
RFA  model,  is  often  needed  in  many  applications.  In  this  chapter  a  model  in 
which  the  comnon  factors  are  treated  as  fixed  quantities,  namely,  the  Fixed 
Factor  Analysis  (FFA)  model,  will  be  reviewed. 

The  FFA  model  starts  with  (2.1.1)  and  its  matrix  equivalent  form 
(3.1.1)  Y  *  lj^n'  +  FA'  +  0, 


or,  equivalently 
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F  “  lMi . V* 

*  l-(l)'-(2) . -(r)1 ' 

5“  lV'2 . V'* 

0  *  * 

"  ^-(l)'-(2)',‘"-(p)1  * 

(When  we  refer  to  the  column*  of  each  matrix  we  attach  parentheses  to  the 

th  th 

subscript ,  that  is,  f^j  and  f ^  stand  fi»r  the  1  column  and  the  i 
row  of  the  F  matrix,  respectively.) 

In  order  to  avoid  the  redundancy  of  the  paraaietrixation  the  restrictions 

(3.1.2)  l'F  ■  0' , 

and 

(3.1.3)  (1/N)F'F  -  .  where  diag(Cp)  -  l*. 

are  enforced.  Since  these  restrictions  are  the  counterparts  of  (2.1.3)  and 

(2.1.4)  in  the  UFA  model,  there  still  exists  the  rotational  indeterminacy  if 
is  identity.  The  distributional  assumption 

(3.1.4)  u(j)  :  jy  0  .  djIN  ).  J-1,2,...,  p, 

is  usually  mads.  The  equivalent  form  of  this  assumption  is 

(3.1.5)  u,  :  N  (  0  .  D  ),  i-l,2,...,N,  i.i.d., 

-i  p  - 

where  D  is  the  p  x  p  matrix  whose  diagonal  elements 
consist  of  d^'s. 

From  (3.1.1)  and  (3.1.4)  it  is  clear  that  the  FFA  model  is  equivalent  to 


tlw  multivariate  regression  nodal  with  unknown  regressor  matrix  end  with 

diagonal  error  dispersion  matrix,  or  to  the  **group  regression  model  with 
nnknown  common  regressor  matrix  with  heterosehedastic  error  varianoes, 

[Noviok,  at.,  al.  (1972).] 

The  likelihood  of  the  set  of  parsmeters,  namely,  P,  A,  a,  and  D,  is  given 
by 

(3.1.6)  L3(m.F.A.D|Y) 

■  *5-.1  i 

—  T^.jt  djN/2E,p[(-l/2il,)CI  1  ), 

where 

°j  "  <Z( j)~™j-~F-J)  <Z( 
or,  equivalently. 

(3.1.7)  L3(m,F.A.D|Y)  |d|~  ^ExpU-l^trlD^Q] , 

where 

(3.1.8)  Q  ■  (Y-lm'-FA*) '(Y-lm'-FA') . 

As  pointed  out  by  Anderson  and  Snbin(1956) ,  however,  this  likelihood  is  not 
bounded  above,  that  is,  if  any  one  of  the  quadratic  forms,  Qj,  say,  in  the 

exponent  is  equal  to  zero  the  likelihood  goes  to  infinity.  The  simplest  way  to 
avoid  this  problem  is  to  extend  the  withiit-variable  homo seeds a tioity  assumption 
(3.1.4)  to  aeross-variable  homo seeds sticity,  that  is, 

(3.1.9)  u(j)  :  jyo  .  d^  ),  j-1,2 . . 

where  d  is  a  scalar  common  to  all  the  variables. 
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The  MJE  under  this  nodal  with  tha  rastrictioas  (3.1.2)  sod  (3.1.3)  with 
(^•I^  is  given  by  tha  Eckart-Young  (1936)  decomposition  of  the  data  aatriz 

Y  after  colons  cantering.  That  is,  tha  astiaates  of  a.  F,  and  A  are  given, 

respectively,  by  tha  colonn  swans  of  Y  aatriz,  tha  first  r  eigen  vectors  of 
JYY'J  normalised  so  as  to  satisfy  (3.1.3),  and  tha  first  r  eigen  vectors  of  the 
saapla  dispersion  aatriz  S  normalized  so  that  A’A  “  L,  the  eigen  valne  aatriz. 
This  is  also  the  least  squares  solution  for  the  aodel  (3.1.1)  and  the  resulting 
F  aatriz  is  equivalent  to  tha  first  r  principal  a opponent  scores  of  the  aatriz 
JY.  The  distributional  assuaption  (3.1.9),  which  assumes  that  all  the 
variables  have  equal  unique  variances,  however,  seena  to  be  too  restrictive 
even  if  the  original  variables  have  the  saae  variances.  A  better  way  is  to 
assume  either  that  the  unique  variances  are  known  or  that  they  are  proportional 
to  soae  known  constants.  For  ezaaple,  in  case  of  supposedly  un id  lawns ioanl 
tests,  tha  assumption  that  tha  error  variance  of  each  test  is  proportional  to 
the  length,  as  in  Feldt  (1973),  may  be  used. 

Another  way  to  avoid  the  problem  is  to  have  replications.  Denoting  the 

k**1  replication  by  Y^  and  the  corresponding  error  term  by 

k*l,2,...,q,  the  likelihood  of  the  parameters  given  q  replications  is  ezpressad 
as 

(3.1.10)  L3(a,F,A.DlY(1).Y(2),...,Y(q)) 

—  |DrqN/2E*pl(-l/2)tr[D_1Q+n, 

where 

(3.1.1D  q+  -  X=i[  Q<k)  ]* 


i-V- 


-I  til  -  ••  in  ft  1 1  iifti 
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and 

(3.1.12)  Q(k)  =  (Y(k)-lm,-FA,),(Y(k)-l«,HFA'). 

Since  the  diagonal  elements  of  Q+  cannot  be  aero  (assuming  each  Y*k*  is 

distinct)  the  likelihood  is  bounded  above.  The  following  estimating  equations 
are  given  by  taking  the  derivatives  of  the  likelihood  plus  the  Lagrange  term  to 
enforce  (3.1.3)  with  Cp*Ir» 

trKF'F  -  (N) Ijj)L*1  . 

where  L*  is  the  r  x  r  symmetric  unknown  matrix, 
and  setting  the  si  equal  to  zero.  (See  the  Appendix  for  matrix  differentiation.) 

(3.1.13)  m  ■  (1/N)Y*  *1, 

where  Y*  -  (1/q)  Y(k)  ]. 

(3.1.14)  D  -  (l/qN)diag(Q+) . 

(3.1.13)  A  *  (1/N)Y*  *JF. 

(3.1. Id)  F  -  JY'D_1A(A,D-1A+L*)~1. 

The  last  two  equations  reduce  to 

(3.1.17)  RD~1/2AT  =  D~1/2A1L, 

where 

R«  (1/N)  d'1/2Y,'JY*D~1/2, 

L  is  a  r  x  r  arbitrary  diagonal  matrix, 
and  T  is  a  r  x  r  arbitrary  orthonormal  matrix, 
and 


La.  ^  -f 
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(3.1.18)  FT  =  N  JY’D-1AL-1. 

Arbitrariness  of  T  and  L  accounts  for  the  orthogonal  rotation.  Therefore,  by 
setting  T  -  1^,  the  conditional  estimate  of  A  given  D  is  given  as  the  eigen 

vectors  of  R  normalized  snch  that  A'D  m  L,  the  eigen  valoes. 

Some  other  treatswnts  of  this  problem  have  been  proposed  in 
Anderson  and  Rub in( 1956),  Anderson  (1984),  and  McDonald(1979b) .  While  the 
former  two  consider  the  estimation  of  the  factor  loadings  and  the  error 
variances  on  the  basis  of  the  noncentral  Wishart  distribution  with  the 
restrictions  (3.1.2)  and  (3.1.3)  with  Cp  »  If,  the  latter  estimates  the 

factor  soores  as  well  as  the  factor  loadings  and  the  error  variances  by 
maximizing  the  likelihood  ratio.  However,  the  method  fails  to  produce  unique 
estimates  of  the  factor  scores  in  the  sense  that  all  the  estimates  that  are 
produced  by  the  Guttman/Kestelman  formula  in  the  Appendix  also  maximize  the 
likelihood  ratio. 

A  Bayesian  treataient  of  this  model  will  be  proposed  in  the  next  Chapter. 


The  Congeneric  Test  Model 

Consider  the  situation  in  which  an  instructor  is  to  grade  his/her  students 
on  the  basis  of,  say,  three  examinations  such  as  two  midterms  and  one  final. 
This  is  usually  done  by  calculating  a  certain  composite  soore  such  as 
A)  swan  of  raw  scores, 
or 


B)  mean  of  standardized  scores 


When  we  assume  that  there  is  only  one  coamon  factor,  that  is,  r  ■  1,  the 


FFA  node 1  becomes,  by  denoting  the  first  colnnn  of  F  and  A  by  f  and  a, 
respectively, 

(3.2.1)  Y  -  la’  +  fa*  +  D, 


or 


'y  '  "j  *  fi*j  *  V  i'1,2 . N- J’l>2 . »• 

which  states,  as  in  the  general  case,  that  the  observed  score  is  the  sum  of 


non-random  and  random  parts,  namely,  m^f^  and  u^,  respectively. 


with  a  particular  structure  being  enforced  on  the  non-random  part.  However, 
the  way  we  decompose  the  observed  score  into  two  parts  is  not  unique.  As 
pointed  out  by  Lord  and  Novick  1968,  Chapters  2  and  24,  we  can  have  another 
decoaqwsition,  namely,  the  classical  test  theoretic  decomposition, 

(3.2.2)  y.j  =  t..  +  eijf  i«1.2 . N,  j=1.2....,p, 

where  t^  is  defined  as  a  non-random  part  over  the  propensity  distribution, 


i.e.,  replications.  These  two  decompositions  sometimes  contradict  each  other 
since  the  factor  analytic  decomposition  is  usually  made  without  any  reference 
to  the  replications.  That  is,  the  unique  score  in  the  factor  analytic 
decomposition  may  contain  some  of  the  non-random  elements,  namely,  the  speoific 
score  of  each  variable  defined  over  replications.  However,  when  we  deal  with 
only  one  set  of  observations,  Y,  it  is  impossible  to  distinguish  the  specific 
score  from  the  unique  score.  Therefore,  we  proceed  for  the  present  as  if  the 
specific  scores  are  zero. 

With  this  understanding  in  mind  (3,2,1)  states  that  the  non-random  part  of 
the  observed  score  of  each  test,  the  true  score,  is  linearly  related  to  each 


+  const.. 
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(4.2.17)  L,,,  -^,1  4,j.  1 
where 


(4.2.18)  Ljjj,  =  -n  ln(s/2)  +  2  In  Gamna(n/2)  +  (n+2)  In  d^  +  %/Ay 

For  the  second  stage  prior  distributions  we  only  assume  the  mutual 
Independence 

(4.2.19)  f(  H  |  S  )  =  f(  Hp.  HA.  Hp  I  S) 

- «  W  f*  W  f<  w* 

and  do  not  elaborate  the  specific  form  until  it  becomes  necessary.  When  a 
priori  zeros  are  to  be  specified  they  should  be  so  specified  here.  The  easiest 
way  is  to  assnme  a  spike  function  as  the  second  stage  prior  distribution  of  the 
hyperparameters  whose  elements  are  assumed  to  be  concentrated  around  zero. 


Posterior  Joint  Distribution 

Using  the  Bayes'  theorem  the  posterior  joint  distribution  of  F,  A,  D,  and 
n  is  given  as 

(4.3.1)  f(  F,  A,  D,  H  t  S,  !  )  =  f (Y|F,A,D)  f (F.A.DlH)  f(H|S), 

Minns  twice  the  log  of  this  density  is  given  by 

(4.3.2)  L  *  Lp.^  +  +  Lp,  +  Lg  +  const. 

'  )i-l[  LAFi  *  Si  '  *  LA  *  4)  *  S)  *  eo,‘,,• 

•  5jSi1  h’Aj  *  L*J  *  Sj 1  *  S  *  h) 4  “"**• 

where 

(4.3.3)  =  tr(  Y-FA’  )D_1(  Y-FA'  )'  ♦  N  In  |dI, 
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(4.2.11)  f(  D  I  "d  >  “  f<  “j  1  V  >■ 

where 

(4.2.12)  £(  d.  I  Hp  )  *-  (e/2)n/2/GeaM(n/2)  d<_1/2)(n+2) 

x  Exp[  (~l/2)«/dj  ].  j-1,2 . p, 

where  Gaama(x)  is  the  |uu  function. 

Note  that  f  and  a,  which  do  not  have  any  snbsoript  or  whioh  do  have  subscript 


k,  are  not  parameters  but  hyperparaaeters. 

The  corresponding  minus  twice  the  log  densities  are, 

(4.2.13)  Lj.  -  L^.  ]  ♦  const., 

where 

(4.2.14)  Lj,.  -  In  ICpI  +  f.'C^. 

if  globally  exchangeable,  i=l,2,...,  N, 

■  *”  "W1  * 

if  locally  exchangeable  and  i  belongs  to  subgroup  k,  k*l,2 


(4.2.15)  La  ■  1  +  const., 

where 


S' 


if  locally  exchangeable  and  j  belongs  to  subgroup  k,  k=l,2 


on  the  first  dimension,  and  the  members  of  the  second  group  have  high  loadings 


on  the  second  dimension,  we  may  specify  that  a^  and  a^  are  concentrated 

around  and  [O.a^l*  respectively.  This  is  a  similar  treatment  of  a 

priori  zeros  described  in  Rubin  and  Thayer  (1982)  but  more  general  in  the  sense 
that  this  does  not  force  the  parameters  to  be  zero,  avoiding  a  potential 
specification  error. 

The  density  functions  are,  respectively, 

(4.2.7)  f(  F  I  Hp  )  “  f(  li  I  Hp  )  ] 

where 

(4.2.8)  f(  f.  |  Hp  ) 

—  ICpf172  Ezp[(-l/2)fi'C^1f.]. 

if  globally  exchangeable,  1*1,2,...,  N, 

—  ICpJ*171 

if  locally  exchangeable  and  i  belongs  to  subgroup  k,  k*l,2,. . . ,G^, 

(4.2.9)  f(  A  I  HA  )  -  f(  Sj  I  Ha  )  J, 

where 

(4.2.10)  f(  fj  I  HA  ) 

—  ICAf1/2  Exp((-l/2)(a  -arc^U  -•)]. 

if  globally  exchangeable,  j*l,2,...,  p, 

“  \CJ~1/2  Exp[(-l/2)(aj-ak)'C^(aj-ak)l, 

if  locally  exchangeable  and  j  belongs  to  subgroup  k,  k-1,2,. .. ,GA» 
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(4.2.6)  :  N^(  a^.  Cj^t  ),  iid,  if  j  belongs  to  group  k'. 
k'-l,2,...,GA. 

where  a^_ ,  end  ,  ere,  respectively,  r  x  1  end  r  z  r. 

In  this  cese  we  heve 

Bp  -  l  (  fk.  ).  1-1.2 . Op  1. 

end 

B»  -  1  <  V  CAi  >•  k*1'2 . °*1- 

A  locally  exchengeeble  prior  distribution  for  the  error  verienoe  is  not 
considered  here  since  in  reel  epplicetion  they  cen  almost  elweys  be  considered 

til 

to  be  globelly  exchengeeble.  Ve  denote  the  number  of  observetions  in  the  k 
locally  exchangeable  group  by  n^  end  the  nuaber  of  variables  in  the  k'^ 
locally  exchangeable  group  by  n^,.  It  should  be  noted  that  (4.2.5)  and 

(4.3.6)  can  be  used  independently.  That  is,  for  exasiple,  we  may  have  a 
globally  exchangeable  prior  on  the  factor  scores  and  a  locally  exchangeable 
prior  on  the  factor  loadings.  Also,  by  setting  one  of  the  inverse  matrices  of 
the  dispersion  hyperparameters  equal  to  zero,  we  can  handle  those 
observations/ variables  whose  a  priori  grouping  information  is  not  clear.  That 
is,  those  observations/variables  with  zero  precision  hyperparameter  matrix  are 
treated  at  having  uniform  prior  distributions. 

Vhen  some  of  the  location  hyperparameters  are  assumed  to  be  concentrated 
around  zero  we  may  specify  this  in  the  specification  of  the  second  stage  prior 
distribution.  For  example,  if  there  are  two  locally  exchangeable  group'  of  the 
variables  and  we  believe  that  the  members  of  the  first  group  have  high  loadings 
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dispersion  matrix, 

(4.2.3)  •  :  N  (  n,  C.  ).  j«l,2,...,p,  iid. 

“J  r  ~  A 

where  a  is  the  r  x  1  vector  of  mean  and  is  the  r  x  r 


dispersion  matrix. 


(4.2.4)  dj  :  X~2( 

where  X  2(  n,s  ) 
with  the  degrees 
With  the  previous 

"p-  i  i-  <y  i. 


n  ,  s  ). 

indicates  the  inverted  chi  square  distribution 
of  freedom  n  and  mean  s/(n-2). 
notation  we  have 


»*■<!•  C»  >• 


and 

Hp  *  I  n,  s  ]. 

The  three  sets  of  prior  distributions,  respectively,  state  that  all  the  factor 
scores,  factor  loadings,  and  the  error  variances  are  globally  axchangeable. 
Since  the  model  is  based  on  column  centered  data  and  considering  the 
snltiplicative  redundancy  between  P  and  A,  we  set  f*0r  and  C^I^.  The 

treatment  of  the  oblique  model  will  be  stated  later. 

When  the  globally  exchangeable  prior  distributions  are  not  appropriate  we 
may  use  the  locally  exchangeable  prior  distributions: 

(4.2. 5)  £k,  ),  iid,  if  i  belongs  to  group  k, 

k"l,2,...,Gp, 

where  f^  and  Cp^  are,  respectively,  r  x  1  and  r  x  r. 


-•  ^  -• 


.1 


VJ 


-'■J 


*  .1 


■  '.S 


variables  require  heavy  knowledge  of  vooabnlary  and  the  rest  are  sore  content 
oriented  we  nay  a  priori  assume  that  there  are  two  locally  exchangeable  groups 
of  variables.  Also,  if  we  believe  that  there  are  some  gender  differences  in 
terms  of  the  factor  soores  but  that  the  factor  loadings  are  invariant  aoross 
sex.  we  ray  assume  the  exchangeability  of  factor  scores  within  boys  and  girls, 
but  not  globally.  The  model  proposed  here  is  general  enough  to  handle  any  of 
these  situation. 

Ve  consider  the  following  forms  for  the  prior  distributions  of  the 

parameters. 

(4.2.1)  f(  F,  A,  D  I  H  )  -  f(FlHp)  f(A|H^)  ftOlHp). 

HA 

distribution  of  F,  A,  and  D,  respectively.  The  independence  assumption  of  F 
and  (A,  D)  seems  to  be  natural  since  knowledge  of  the  characteristics  of  each 
subject  usually  does  not  affect  knowledge  of  the  characteristics  of  variables. 
The  independence  of  A  and  D  ray  not  seem  to  represent  the  real  situation 
considering  the  fact  that  the  expected  dispersion  matrix  of  the  observation  is 
expressed  as  the  sum  of  AA'  and  D,  but  this  is  not  the  case  since  the  prior 
distributions  are  to  be  specified  prior  to  the  data  collection.  That  is,  we 
argue  that  A  and  D  are  independent  until  we  calculate  the  sample  dispersion 
ntrix. 

For  each  component  we  first  assume  the  following  globally  exchangeable 
prior  distributions: 

(4.2.2)  f.  :  N  (  f ,  C) ,  i=l,2 . N,  iid, 

~  1  r  ~  r 

where  f  is  the  r  x  1  vector  of  man  and  is  the  r  x  r 


,  and  Hp  are  the  first  stage  hyperparamters  of  the  prior 


where  Hp» 


the  formulae  the  range  la  not  specified.  The  seeond  stage  hyperparameter  S  is 
typically  specified  to  provide  a  relatively  flat  prior  distribution  for  the  B, 
expressing  ignorance  about  the  location.  This  type  of  prior  distribution  is 
called  an  exchangeable  prior  distibution.  It  should  be  noted  that  the  first 
stage  hyperparameters  H  can  be  constant  if  the  distribution  of  B  is  really 
tight. 

As  a  variation  of  the  exchangeable  prior  presented  above,  we  may  have  the 
following  locally  exchangeable  prior. 

P.  :  f(  P.  I  B  )  ,  iid,  if  k  belongs  to  a  subset  g, 
x  k  g 

B  :  f(  fl  |S),  g  -  1,2,. ...0.  iid. 

g  g 

That  is,  we  divide  k  parameters  into  G  subsets  and  assume  the  exchangeability 
within  each  subset.  Since  the  exchangeability  assumption  is  crucial  in  real 
data  analysis  much  caution  should  be  exceroixed  when  it  is  incorporated. 

In  the  factor  analytic  oontext  there  are  N  individuals,  each  of  which  is 
regarded  as  a  population,  and  the  location  f ^  of  each  individual  is  to  be 


estimated.  Also,  each  of  the  p  variables  represents  p  populations  and  their 


parameters  a^'s  and  dj's  are  to  be  estimated.  Unless  we  are  to  perform 


some  confirmatory  study  it  is  often  difficult  to  specify  the  informative  prior 
distributions  for  all  of  N+( r+l)p  parameters.  Also,  it  is  often  the  case  that 
we  know  that  some  of  the  tests  or  some  of  the  subjects  are  very  siatilar  to  each 
other  prior  to  data  collection.  Therefore,  the  exchangeable  prior  seems  to  fit 
the  typical  application  of  the  model  very  well.  For  example,  if  all  the 


variables  to  be  enalyzed  are  supposed  to  measure  reading  skills  we  may  a  priori 
assume  that  those  variables  are  exchangeable.  However,  if  half  of  the 
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that  ia  propar.  Second,  if  we  have  very  weak  knowledge  we  nae  a  noninformat ive 
prior  distribution  that  is  improper.  The  third  case  is  that  of  an  exchangeable 
prior  distribution  and  is  applicable  when  there  are  several  of  parameters  which 
represent  the  same  characteristics,  say,  looation,  of  different  populations  and 
we  believe  those  parasmters  are  similar  to  each  other.  For  example,  when  the 
means  of  m  normal  populations  are  to  be  estimated,  we  may  know,  a  priori,  that 
those  m  values  are  similar  to  each  other  but  may  not  be  able  to  precisely 
specify  the  'mean*  of  these  m  values. 

The  relative  similarity  of  prior  values  can  be  expressed  as  the  following 
hierarchical  forms.  According  to  the  model  stated  above  it  is  assumed  that 
pk  :  f<  pk  I  H  ),  k-1,2, . . . ,m,  iid. 

H  :  f(  H  I  S  ), 

where  P*  1  P,.  ....  p  ]'  is  the  a  x  1  parameter  veotor  of  interest, 

12  B 

H  is  the  nl  x  1  first  stage  hyperparameter  veotor  and  S  is  the  n2  x  1  second 
stage  hyperparameter  for  the  prior  parameter  H. 

That  is,  we  express  the  relative  similarity  of  all  parameters  by  assuming 
that  they  come  from  the  same  distribution  and  express  the  uncertainty  of  that 
distribution  by  the  probabilistic  structure  of  the  hyperparameters.  The  prior 
distribution  of  the  p^'s  can  be  expressed  without  using  B  if  we  marginalise 

the  joint  distribution  of  P  and  H  with  respect  to  H.  In  this  case  the  prior 
distribution  of  P  can  be  written  as 

f(  P  I  S  )  -  /  f(  P  I  H  )  f(  H  I  S  )  dH. 

Since  it  is  usually  the  case  that  n2  <  nl  <  a,  it  should  be  easier  to  specify 
the  prior  oonstant  S.  It  should  be  understood  that  the  range  of  integrations 
is  the  domain  of  the  variables  to  be  integrated  out.  To  avoid  co^>lexity  in 


<-l/2)N 


xE*p(  (-1/2)  <Z(J)-FaJ)'<Z(J)-F.J>/dJ  ). 

Bare,  the  symbol  *=  is  again  used  to  denote  the  proportionality. 

Minns  twice  the  log  likelihood  is 
(4.1.5)  •  -2  In  f(  I  I  P.  A,  D  ) 

-  tr(  Y-FA*  )D-1(  Y-FA*  )'  +  N  In  IdI  +  const. 

■  ]  +  N  In  |d|  +  const., 
where 

LAFt  -  <irA£1|'D‘1<2r«1>- 

■  5j=l^  'SfAj  J  +  N  1,1  IdI  +  const., 

where 

^aj  -  ^(JrFsj>,<i(j)‘FSj,/dj- 

In  the  following  sections  the  subscript  i  always  refers  to  the  observations, 
and  the  sobscript  j  to  the  variables. 


Prior  Distributions 

In  general  there  are  three  ways  to  express  our  prior  beliefs  about  the 


parameters  of  interest  in  a  Bayesian  estimation  procedure.  First,  if  we  have 
strong  knowledge  of  the  parameters  we  use  an  informative  prior  distribution 


is  the  N  x  1  vector  of  centered  observe t ions  for  the  variable  j 
j  ia  the  N  x  1  vector  or  the  error  teraa  for  the  variable  j, 

F  -  [  f  ,  f^,  ....  f^  ]'  ia  the  N  x  r  factor  score  matrix. 

Collectively,  ve  any  write, 

(4.1.3)  T-FA'+O, 

where 

1  "  1  *(1)*  *(2)*  “•*  *(p)  1 

"l*l**2 . IN1''  Nxp* 

°  "  1  2(1)*  2(2)*  *••*  2(p)  1 

’  1  2r  Sj . V'*  Nxp. 

Therefore,  the  likelihood  of  F,  A,  and  D  given  Y  can  be  written  as 

(4.1.4)  f(  Y  I  F.  A.  D  )  — 

|D| (_1/2)n  ExP(  (-1/2)  trl  (YHFA* )D_1(Y-FA* ) *  ]  ) 

—  f<  Xi  I  F,  A,  D  )  ), 


where 


a  -  (1/N)Y*1n, 


and  do  not  treat  it  as  a  nodal  par am# tar  in  order  to  have  simpler  fora. 
Therefore,  it  should  be  understood  that  the  observation#  are  centered,  i.e 
each  obaerved  variable  has  a  aero  saaple  swan. 

Ye  state  the  aodel  as  follows. 

(4.1.1)  xi  -  Af  +  ui,  1-1.2.....N. 

u.  :  N  (  0  ,  D  ),  i=l,2, ... ,N,  iid, 

-i  p  -p 

where 

2^  is  the  p  z  1  vector  of  the  ooluan  centered  observations  for  subject  i. 
u^  is  the  p  x  1  vector  of  the  error  teras  for  subject  i, 

«i  -  I  fA1.  ....  fir  1'.  i-1.2 . N. 

is  the  r  x  1  factor  score  vector  for  observation  i. 

A  ■  I  i2*  •••*  Ip  1 ' 

where  a^  -  I  a^.  a^,  ....  ajr  1 J-l,2....,p, 

is  the  r  x  1  factor  loading  vector  for  the  variable  j. 
is  the  p  x  r  matrix  of  the  factor  loadings, 

and 

D  “  diagt  d. ,  d. ,  ....  d  ]  is  the  p  x  p  diagonal 

l  z  p 

dispersion  aatrix  of  the  error  tern. 

The  diagonality  of  the  D  aatrix  enables  us  to  reexpress  the  aodel 
variable  wise  as  follows. 

(4.1.2)  -  F#j  +  u(j),  j«l,2,...,p. 


2(j)  !  V  -N’  dj*N  J"1,2 


P,  iid 
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CHAPTER  IV 

THE  BAYESIAN  FACTOR  ANALYSIS 


In  this  chapter  a  new  Bayesian  factor  analytic  model,  where  the  factor 
scores  as  well  as  the  factor  loadings  and  the  error  variances  are  treated  as 
one  of  the  paraawters  to  be  estimated,  is  developed.  The  presentation  is  folly 
Bayesian  in  the  sense  that  all  the  parameters  haws  prior  distributions  and  the 
inference  is  based  on  their  posterior  distributions.  After  the  description  of 
the  model  and  the  prior  distributions  the  posterior  joint  distribution  of  all 
the  parameters  is  derived.  Then,  the  joint  distribution  is  marginalised  and/or 
conditiona lined  to  obtain  the  point  estimates.  This  enables  us  to  reduce  the 
factor  score  indeterminacy  problem  to  the  usual  problem  of  the  choice  of  point 
estimate,  namely,  mean,  mode,  or  something  else  of  the  posterior  distribution 
of  the  factor  scores.  Because  the  EN  algorithm  and  its  variatinos  are  used  for 
the  marginalization  of  the  posterior  joint  distribution  a  brief  description  of 
the  algorithm  is  provided  after  the  derivation  of  the  posterior  distribution. 


The  Model 

In  this  chapter  a  model  similar  to  the  one  used  in  the  *ixed  factor 
analysis  is  used.  Since  m  is  defined  as  the  grand  swan  we  replace  it  by  the 

sample  sman,  namely. 


section.  Tims,  within  the  frssiework  of  MLE  it  is  impossible  to  estimate  the 
parameters  of  the  general  congeneric  model.  Some  modifications  such  as  the 
across  test  homoscedasticity  in  (3.1.9)  or 


(3.2.7)  a.  -  1,  j=l,2,...,p, 
J 


and 


is  proportional  to  the  observed  score  variance 


of  test  j, 

are  necessary  to  deal  with  the  form  more  general  than  the  parallel  test  model. 
With  the  former  assumption,  with  some  restrictions  concerning  the  scale  and  the 
origin  of  the  true  score  such  as  (3.1.2)  and  (3.1.3),  we  have  the  first  eigen 
vector  of  JYY'J  as  the  MLE  of  f ,  and  with  the  latter,  we  have  B) . 


Therefore,  if  we  are  to  calculate  a  composite  score  based  on  the  general 
congeneric  test  model,  the  Bayesian  unidimensional  FFA  model  proposed  in  the 
next  chapter  is  the  only  possible  way.  It  should  be  also  noted  that 
Lindley(1971a)  has  proposed  a  Bayesian  solution  to  the  parallel  test  model  with 
an  exchangeable  prior  distribution  of  the  fsetor  score. 
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other.  This  it  sometimes  referred  to  the  congeneric  test  model ,  Kristof (1974) , 
Feldt(1975),  and  contains  the  parallel  teat  model  or  the  tau-equivalent  test 
Diode 1  as  a  special  case.  That  is.  if  all  the  a's  are  equal  we  have  the 
essentially  tan-equivalent  test  model,  if.  in  addition,  all  the  a's  are  equal 
we  have  the  tau-equivalent  test  model,  and  if,  in  addition,  all  the  unique 
variances  are  equal  we  have  the  parallel  test  model.  Although  classical  test 
theory  does  not  assume  the  strong  form  of  the  distributional  assumption  nor 
within  test  hoaoscedastisity  such  as 

(3.2.3)  u^  :  N(0  ,  dj ) .  i-l,2....,N,  j«1.2....,p. 

we  assume  this  to  facilitate  the  comparison.  This  assumption,  which  specifies 
the  characteristic  of  the  error  term  associated  with  each  test,  does  not  seem 
to  be  strong  cosqiared  to  the  normality  assumption  of  each  subject's  true  score 
used  in  the  RFA  model. 

Vith  assumption  (3.2.3)  and  the  parallel  test  assumption, 

(3.2.4)  aij  »  0,  j«»l,2,...,p, 

(3.2.5)  .j  -  1.  j-1.2 . p. 

and 

(3.2.6)  dj-d.  j-1.2 . p. 

it  can  be  shown  that  the  MLE  of  the  factor  score  is  given  by  A)  in  the  exaaple 
above . 

However,  if  we  allow  the  unique  variance  to  vary  freely  in  order  to  deal 
with  the  tau-equivalent  models  it  can  be  shown  that  the  MLE  does  not  exist  even 
with  the  restrictions  (3.2.4)  and  (3.2.5).  This  is  due  to  the  same 
unboundedness  of  the  likelihood  function  described  in  the  previous 


(4.3.4) 

(4.3.5)  LpAj  *  (^p-Ft^'^jj-F.^/dj. 

(4-3-<>  S  ■  L-i[  %  >• 

(4.3.7)  -  1^,  +  N  In  A. 

=  -n  ln(*/2)  +  2  In  Gumi(n/2)  +  (N+n+2)  In  d  +  s/d  , 

J  J 

and 

(4.3.8)  Lg  =  -2  In  f(  H  I  S  ). 

Unlike  the  likelihood  in  Chapter  III  the  posterior  joint  distribution  is 


not  unbounded  above  due  to  the  tern  s/d 


j 


introduced  by  the  prior  distribution 


of  the  error  variance.  However,  it  is  found  that,  unless  we  have  the  value  of 
s  which  is  comparable  to  the  magnitude  of  the  residual  sun  of  squares, 

RSSj  - 

the  joint  mode  of  (4.3.1)  exists  in  the  region  whore  some  of  the  error 
variances  are  close  to  zero. 

Note  that  (4.3.1)  also  gives  various  conditional  distributions  by  droping 
the  factors  which  consist  purely  of  the  paraamters/hyperpsraaeters  on  whioh  we 
would  like  to  condition.  In  terms  of  (4.3.2),  ninus  twice  the  log  posterior 
conditional  distribution  of,  for  example,  the  factor  scores  and  the  factor 
loadings  given  the  error  variances  and  the  hyperparameters  is  given  by 

WW“nst- 


Posterior  Marginal  Distribution! 

When  the  globally  exchangeable  prior  ia  used  either  for  the  factor  scores 
or  for  the  factor  loadings,  following  two  marginal  distributions  can  be  derived 
analytically. 

(4.4.1)  f(  F,  D.  H  I  S.  Y  )  -  /  f(  F.  A.  D,  H  I  S,  Y  )  f(  A  I  S  )  d  A 
=-  /f(Y|F,A,D)f(A|HA)dA  f (FlHp)f (DlH^f (H|S) 

-  1  1-1/2  * 

Exp(  (-l/lH^jj-FaJ'CFC^'+djy^^jj-Fa)  )  ] 
x  ftFlHp)  f (D|e.,,)  f(HlS). 

(4.4.2)  f(  A.  D.  H  I  S,  Y  )  -  /  f(  F,  A.  D,  H  I  S,  Y  )  f(  F  I  S  )  d  F 
/f(Y|F.A,D)f(F|HF)dF  f (AtHA)f (D|B^)f(H|S) 

=  I  AC^A'  +  D  |{_1/2)N  £xp(  (-1/2)  tr(  Y'Y  D_1  )  ) 
x  f(A|HA)  f (Dilip)  f(H|S). 

Also,  with  the  globally  exchangeable  prior  of  the  error  variances,  we  have 

(4.4.3)  f(  F,  A.  H  I  S,  Y  )  -  /  f(  F.  A,  D,  H  I  S,  Y  )  d  D 

“  (RSSj  +  a)-(N+n)/2  ] 

x  p  (t/2)^2  Ganna ((N+n)/2)  /  Ganna(n/2) 
x  f(Fl^)  f(A|HA)  f(H|S), 

and  Gamma(x)  is  the  Gaaaut  function. 


Note  that  the  posterior  marginal  distribution  of  A,  0,  and  H  is 


essentially  equivalent  to  the  faaiiliar  likelihood  on  which  the  nsnal  MLE  is 
based.  In  this  sense  our  approach  includes  both  the  random  and  the  fixed 
factor  analytic  model  reviewed  in  the  previous  chapters. 
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Theoretically,  when  we  are  interested  in  the  estimation  of  the  factor 
scores  and  the  factor  loadings  we  may  be  able  to  use  the  joint  mode  of  the 
marginal  distribution  of  the  factor  scores  and  the  factor  loadings  as  the  point 
estimates.  However,  since  the  distribution  given  in  (4.3.3)  has  the  same 
tendency  as  the  joint  posterior  distribution  given  in  (4.3.1),  it  is  not 
desirable  to  use  the  mode  of  (4.3.3)  as  the  estiamte.  That  is,  the  mode  is 
close  to  the  region  where  some  of  the  d ,'s  are  zero.  (It  can  be  shown  that 


(4.4.1)  also  has  the  same  characteristic.)  On  the  other  hand,  the  marginal 
distributions  given  in  (4.4.1)  and  (4.4.2),  contain  the  location  paraawter  F  or 
A  and  the  scale  parameter  D  together.  Therefore,  the  modes  of  these  densities 
are,  in  this  sense,  still  joint  audes  (  e.g.  (F,D)  or  (A.D)  )  and  suffer  the 
criticism  of  O'Hagan  (1976),  or  Fienberg  (1972).  Also,  Knbin  and  Thayer 
(1982),  who  derived  the  MJB  from  the  equivalent  likelihood,  note  that 
’estimation  of  variances  should  be  from  their  amrginal  likelihood.’  The  point 
is  that  the  joint  mode  or  the  joint  MLE  tends  to  underestimate  the  error 
variances  due  to  the  lack  of  degrees  of  freedom  adjustment,  often  resulting  in 
the  Haywood  case.  Therefore,  some  additional  marginalization  is  necessary. 

Since  we  have  the  hyperparameters  in  the  posterior  distribution  we  first 
marginalize  all  the  parameters  in  order  to  have  point  estimtes  of  the 
hyperpa ream  ter s.  That  is,  the  point  estimates  of  the  hyperparameters  are  given 
by  the  mode  of 

(4.4.4)  f(  H  |  S.  Y  )  -  /  /  /  f(  F,  A,  D,  H  I  S,  Y  )  dF  dA  dD. 


Then,  conditioned  on  the  obtained  values  of  the  hyperparameters,  the  point 
estimate  of  the  error  variance  is  derived  as  the  mode  of 

(4.4.5)  f(  D  I  H,  S,  I  )  -  /  /  f(  F,  A,  D  I  H,  S,  Y  )  dF  dA. 

Then,  conditioned  on  the  point  estimates  of  the  error  variances  and  the 
hyperparameters,  we  estimate  the  factor  scores  and  the  factor  loadings  as  the 
mode  of 

(4.4.6)  f(  F,  A  I  D.  H.  S,  Y  ), 

(4.4.7)  f(  F  I  D,  H.  S.  Y  )  -  /  f(  F,  A  I  D,  H.  S,  Y  )  dA, 

or 

(4.4.8)  f(  A  I  D,  H,  S.  Y  )  ■»  /  f(  F,  A  I  D,  H,  S.  Y  )  dF. 

As  O'Hagan  (1976)  noted,  the  above  conditional  modes  are  considered  to  be 
better  estimate  than  the  marginal  mode.  Vhen  the  main  interest  of  the  analysis 

is  the  estimation  of  the  factor  loadings,  the  mode  of  (4.4.8)  should  be  used. 

Vhen  the  main  interest  of  the  analysis  is  the  estimation  of  the  factor  scores 
the  amde  of  (4.4.7)  should  be  used.  Finally,  if  both  are  of  interest  the  mode 
of  (4.4.6)  should  be  used. 

Since  analytic  marginalization  is  impossible  we  use  some  variations  of  the 
EM  algorithm,  Dempster,  et.  al.  (1977),  in  the  later  section.  A  brief 
descriptions  of  the  EM  algorithm  and  its  variations  are  presented  in  the  next 
section. 

The  EM  Algorithm  and  Its  Variations 
Given  a  set  of  random  variables  (  u  ,  v  ),  and  their  joint  density 
function,  f(  u,v  I  g  ),  where,  £  is  the  parameter  which  determines  the  density. 
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tbt  EM  algorithm  can  ba  used  to  find  the  maximum  likelihood  estimate  OLE)  of  £ 
on  the  basis  of  the  narginalized  likelihood 

g(  !  I  {  )  •  /  f(  B,  T  |  2  )  d  V. 

In  the  original  article  by  Dempster,  et.al.(1977),  a  set  (  u,  v  )  is  refered  to 

oomplete  data,  u,  incomplete  data,  and  y,  missing  data.  The  terminology  makes 

sense  when  ▼  represents  the  portion  of  observations  which  are  missing  but  in 

more  general  applications  it  shonld  be  understood  that  the  random  variables  to 
be  integrated  out  are  refered  to  as  missing  data.  (  See  the  following 
examples.)  The  algorithm  is  an  iterative  process  consisting  of  two  steps, 
namely,  the  E-step  and  the  K-step.  Aa earning  that  an  estimate  of  g,  say,  g^, 

is  given,  for  each  iteration  the  E-step  calculates  the  conditional  expectation 
of  ln(  f(  u,v  I  g  )  )  given  v  and  g^,  namely, 

Ey(ulg)  »  /  ln(f (u,vlg))f (vlu.gg)  dy, 

and  in  the  M-step  Ey(ulg)  is  maximized  with  respect  to  g  assuming  the 

e 

parameters  of  the  conditional  distribution  of  v,  say  y  ,  are  constant.  That 

e 

is,  although  the  parameter  v  ,  which  is  a  function  of  u,  and  g^,  is 
included  in  Ey(ulg)  it  is  treated  as  a  constant  when  the  derivative  of  Ey 
is  taken.  Notation  such  as  Ey(ulg)  should  not  be  confused  with  the  simple 
expectation  sign  El,].  The  successive  application  of  these  two  steps  will 
usually  result  in  the  MLE  of  the  marginal  likelihood  g(u|g)  when  the  process 
converges.  It  is  assumed  that  some  initial  value  of  g  is  given  prior  to  the 


iteration.  See  Wu  (1983)  for  the  conditions  required  for  the  convergence. 

When  the  complete  data  hat  the  distribntion  which  belongs  to  the  regnlar 
exponential  family  the  conditional  expectation  of  the  sufficient  statistics. 
t(u.y).  may  be  used  for  the  calculation  since  the  log  likelihood  can  be  writen 

as 

In  f(u,vl£)  *  +  +  const., 

where  a(p)  is  a  function  of  £  only, 

and  const,  is  a  constant  tens  which  does  not  include  j>. 

When  £  has  a  prior  distribution,  f<£lH),  the  EM  algorithm  can  be  used  to 

find  the  posterior  mode  of 
g(  £  I  u,  H  )  *  /  f(  £,  v  I  u,  H  )  d  v 

=  /  f(u,v|£)dy  f(ElH). 

In  this  case,  the  E-step  calculates  the  conditional  expectation  with  respect  to 
f(vlu.£,H),  that  is, 

Eyiflu.H)  *  /  ln[f(u,vl£)]f(v|u,£,H)dy, 

and  the  M-step  maximises  this  with  respect  to  £  treating  the  parameter  of  the 

conditional  distribution  of  v  as  a  constant.  When  the  process  converges  we 

presume  to  have  the  posterior  marginal  mode  of  £  with  y  integrated  out. 

Now,  suppose  we  have  an  observation  Y,  whose  distribution  is  described  by 
f(Y|P),  and  a  hierarchical  prior  distribution  of  P  of  the  form  f (P|H)f (HlS) , 
where  H  is  the  first  stage  hyperparameter,  and  S,  the  second  stage 
hyperparameter.  The  posterior  joint  distribution  of  P  and  B  is  given  by 


f(  P.  H  I  S  )  -  f(Y|P)  f (P|H)  f(H|S). 
and,  the  potter ior  marginal  distribution  of  B  with  P  integrated  out  by 
g(  B  I  S  )  -  /  f(P.BlS)  dP. 

In  order  to  estimate  the  modal  value  of  B  of  the  marginal  posterior 
distribut ion  g,  we  can  also  use  the  EM  algorithm  by  identifying 
u  «  Y,  v  *  P,  and  j  =  H, 

Then  the  minimization  of  minus  (twice)  the  log  likelihood  is  preferred  to 
the  maximization  due  to  its  simplicity,  its  expectation, 

*  (-2)  /  Inf (u.vlg)f (vlu,j)dv,  should  be  calculated  in  the 

E-step  and.  in  the  M-step.  it  should  be  minimized  with  respect  to  £. 

Ohe  of  the  advantages  of  the  EM  algorithm  over  the  direct  maximization  of 
g(u|£)  is  its  simplicity.  In  the  usual  application,  where  (u,v)  have 

multivariate  normal  distribution,  Ey  is  much  easier  to  work  with.  That  is, 

the  siaximization  can  be  done  analytically.  Another  advantage  is  its 
flexibility.  Bock  and  Aitkin  (1981)  used  its  variation  in  order  to  have 
marginal  MLE  of  the  item  parasmters  under  the  latent  trait  model  where  the 
conditional  expectation  of  v  is  plugged  into  the  complete  log  likelihood 

instead  of  taking  its  expectation.  We  propose  another  variation  in  order  to 
smrginalize  with  respect  to  several  sets  of  missing  data,  say,  v^,  v^.  ... 

in  the  later  section. 

In  the  following  exaaqiles  the  actual  application  of  the  EM  algorithm  will 
be  explained.  The  first  example  is  a  straightforward  one  where  some  of  the 
observations  are  missing.  The  second  example  shows  how  we  treat  a  subset  of 
the  parameters  as  the  missing  observations. 


5’ 


Example  1:  Estimation  of  Multivariate  Normal  Parameters 

Let  Z  be  the  n  x  p  data  matrix  containing  n  observations  from 

N  (  m  ,  V  ) .  Now,  assume  a  part  of  Z  is  missing.  That  is,  let 
P  “ 


*l' 


where  n*nl+n2,  and 

x^,  i«l,2, ... ,n^,  be  p  x  1  vector  of  observation, 

1^,  i»l,2, . . . .Uj.  be  p  x  1  vector  where 

z'i  ■  i  *t'  *  V  i. 

ai>  i*l,2,...,n2,  and  b^,  i-l,2,...,n2,  are, 

respectively,  pi  x  1  and  p2  x  1  vectors  and  p*pl+p2. 

It  should  be  understood  that  b^’s  are  missing. 

With  the  notation  used  in  the  previous  section,  we  have. 


o*l  •  i 

,.,nl,  and  a^,  1*1,2,., 

•  • »o2  J i 

y  ■  [  b1#  i*l,2, ., 

».,n2  ], 

and 

£  “  [  m  and  V  ], 

and  wo  would  like  to  estimate  a  and  V  on  the  basia  of 
the  marginal  likelihood 

g(  X.a^  i«l,2,....n2  |  m.W  )  -  /  f(  Z  I  m.W  )  d  b. 

See  Orchard  and  Woodbury  (1972)  for  theoretical  basis  of  this  method. 

Now,  minus  twice  the  log  likelihood  of  (m,W)  given  Z,  the  complete  data,  is 

L  -  n  lnltl  ♦  (x^-m)  'W  ^z^-m)  )  +  const. 

The  B-step. 

1.  Conditional  Distribution  of  b^'s  given  X  and  A. 

The  conditional  distribution  of  b ^  given  X  and  a^  is 

‘i 1  ‘  :  V!vv*)- 

where 

-i  “  ^2  + 

V'  " 

where 

■  -  [  Sj*,  52'  ]\ 

1  i  pi  1  x  p2 
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+  trW22/, 

-  (  j.*-*  X*-m  )  +  tr**V. 

where  ^  *  [  a^'.  1*. 

(See  Appendix  for  the  expectetion  of  quadratic  fora.) 

Therefore. 

E(  L  I  I.  £l* i*l,2. . . . .n2  )  -  Egt  a.f  I  X.A  ) 

*  5i-l^  ) 

r  N  e  -1  • 

+  2i-l(  (zfs),w  <*1#  > 

♦  a2  trf*V  +  const. 

The  M-step 

By  differentiating  Eg(  n,W  |  X.  a^  i-1,2, . . .  ,n2  ) 

with  respect  to  m  and  W*"^,  ij=  11,  12,  and  22, 

(  see  the  Appendix  for  the  Matrix  Differentiation,  ) 
and  solving  the  resulting  normal  equations,  we  have 

n+  *  (l/n)l  X*.  Y*’  ]  1  , 

—  -n 

-  (1/n)  C^,  ij  -  11.  12,  and  21, 

and 

WM  "  (1/a)  (  C22  +  02  V*  >• 


where  C  is  the  sanple  mean  corrected  sum  of  squares  and  cross  product 
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matrix  and 


'»  are  its  partition  similar  to  the  one  for  W. 

e  * 


Note  that  b^’s  snd  V  are  treated  as  constants  when  taking  the 
derivatives.  The  successive  application  of  these  two  steps  vill  results  in  the 
marginal  MLE  of  m  and  W  with  the  missing  data  integrated  out,  namely,  m+  and 

W+,  at  the  convergence  point. 


Example  2:  Two-Way  Random  Effect  ANOVA  without  Interaction 

Let  us  consider  the  two  way  random  effect  ANOVA  model  without 
interactions. 

yij -  *i  ♦  V  V  . . . . ’• 

where 

e _  :  N(  0.  sE  ),  i*l,2»....p,  j*l,2,...,q,  iid, 
a4  :  N(  m,  s^  ),  i~l,2,...,p,  iid. 


bj  '  N(  °*  *B  >•  J-1’2 . *  1W* 

Note  that  the  overall  mean  is  taken  care  of  by  the  mean  of  parameters,  a^, 

namely,  m.  Also,  to  avoid  the  complexity,  no  replication  is  made. 

Writing  the  model  using  the  linear  regression  form,  we  have 
j  -  Xg  +  e, 

where 

y„„  1 '•  P*  x  1* 

PQ 


x  -  l  yn.  jn . ylq.  y21* 


•  •eg 


iq  iq  J.  pq  *  (p+q). 


§  ■  l  b*  ] *.  (p+q)  X  1. 


£  *  l  t  *  • • • •  Xp  1 ' #  P  x  1 , 

b  -  [  b, ,  b_,  ....  b  ] q  x  1, 

*  12  q 

g  :  N  (  b  .  C  ). 

p+q  - 

a  =  [  ml  ',  0  *  (p+q)  x  1. 

-p  -q 


•A1 
A  p 


V,  ' 


.  (p+q)  x  (p+q). 


By  integrating  g  out  analytically,  we  see  tbat  y  has  the  pq-variate  normal 


distribution  with 


nean  of  j  -  m. 


and 


««v<  y,rykl  )  -  V*A  *  Vb  *  ‘VjlV 


where  d^  is  the  Kronecker's  delta  notation. 


Instead  of  working  directly  on  the  marginal  likelihood  implied  above,  we 


proceed  as  follows. 

Let  ns  assume  that  I  y,  g  ]  is  the  cosq>lete  data  and  g  is  the  missing 


part.  That  is 


u  »  y,  v  “  g,  and  2  “  t  sA»  Sg,  Sg.  and  ■  ] . 

Minns  twice  the  log  likelihood  of  the  complete  data  is 
L  =  pq  ln( Sg)  +  (  j-Xg  ) * (  )  /  $E 

+  In  Id  +  (  |Hn  ) 'C  *(  g-m  )  +  const. 


The  E-step 

1.  Conditional  Distribution  of  g  given  y. 

By  rearranging  two  quadratic  forms  we  see  that 


6  I  r  :  Np+5(  6*.  V*  ). 


where 

•  _i  _i  _i 

V  «=  (  s AX'X  +  C  A  )  \ 


and 

*  *  -1  -1 

g  -  V  (  «E  a'jt  +  C  A«  ). 

*  I  a  * ,  b  '  1 .  say. 

(  See  the  completion  of  sum  of  squares  trick  in  the  Appendix.  ) 
2.  Conditional  Expectation  of  L  given  y. 

E(  L  I  jr  )  -  Ep(  m,  s^Sg.Sg  I  Z  > 

=  pq  ln(sg)  +  (y-Xg  )/*g  +  trX'XV  / Sg 

+  In  |C|  +  (g*-m)  ,C"1(g*-m)  +  trC_V  +  const 
*  pq  ln(Sg)  +  (  (j-Xg'j't^-Xg'j+trX'XV*  )  /  Sg 
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and 


*j  * (f  'F 1  s  •*«»• 

Therefore,  conditioned  on  d^ ,  H,  and  S,  we  have 


(4.S.3.3.)  Sj  I  D.  H.  S.  Y  :  0)  ). 

and  these  conditional  distributions  are  independent  for  jsl,2,...,p.  The  joint 


e 

density  is  denoted  by  f  (  A  I  D,  H,  S,  Y  ),  which  is  the  product  of  each 
normal  density  in  (4. 6. 3. 3).  When  some  of  the  elements  of  D  is  zero  the 

* 

calculation  of  Q  . 's  should  be  performed  using  the  matrix  inversion  formula 
2  in  the  Appendix. 

As  we  noted,  depending  on  the  method  of  quasi  marginalization,  we  can 
either  take  the  expectation  of  Ep(A,D,H|S,Y)  with  respect  to  the  conditional 

distribution  of  A  or  we  can  analytically  integrate  A  out.  First  we  consider 
taking  the  expectation  of  E^,. 

(4.6. 3. 4)  E^p.(D,Il|S,Y)  =  /  Ep(A,D,H|S,Y)  f*(A|D,H,S,Y)  d  A 

■  lu 1  <wwvV d  Sj 1 

+  +  h)  +  Si  +  const' 


v  n  **  *  e  *  * 

=  >,,[  (  RSS.  +tr[(F  'F  +N  V  )Q.l 
4-j=l  J  J 

+  N  a*’V*a*  )/dj  ] 

+  la*  +  h*  +  S  +  Si  +  const-' 


'he 


Now  consider  Ep(  A,  D,  H  I  S,  Y  )  in  (4. 6. 2.1)  as  minus  twice  the  log 
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posterior  density  of  A,  D,  and  H  given  S  and  Y  and  denote  the  density  by 

f*(A.D,HlS,Y) .  Then,  by  writing  variable  wise,  we  have 
(4. 6.3.1)  Ep(  A.  D.  H  I  S.  Y  ) 


+  Na.'V  a.)/d 
“J  ~J  j 


] 


+  h*  +  h  +  4  +  con8t' 

where 

rssI  ■ 

Again  using  the  projection  operator  trick  1  and  the  sum  of  squares 
trick  in  the  Appendix  twice,  we  have 


(4. 6. 3. 2) 


^Aj 


*+LAj 


+  N 


'V  a  /d 

=j  j 


*  *-i  • 

(a  -a .) 'Q.  (a  -a.) 
-J  "J  J  ~j  "J 


+  *  *  +  -1 
+  a.’F  'F  a./d.  +  a'C.  a 
-j  -j  j  -  A  - 


*  *  'Qrisj  * 


where 


W  -  BSSj  '  V 

Q*  *  (  (1/d  )F*’F*+c”1+(N/d.)V*  r\ 
j  J  A  J 

a*  -  Q*(  (l/d.F*'y.  ..-K^a), 

-J  j  J  *<J>  A  - 

•  *  *  -i  • 

P  =  ^  -  F  (F  ’F  )  TF  ' , 


‘WWW  1 


*  “Ft"  CFlVi  ♦W,IW- 

The  sign  #k  denotes  the  snm  over  those  i's  which  belong  to  subgroup  k. 

In  this  case,  the  quantities  to  be  stored  ara 

(4. 6. 2. 6)  Y'kF*  -  (X+kfk<^k'V"lA)\* 
and 

(4. 6. 2. 7)  F*'F*  -  V*[  A'D_1  Yk'Yk  D_1A 

*  *  site*0'1* 

*  '<• 
where 

w  ■  W  4  '• 

Therefore,  jr+k  and  Yk'Yk,  k*l,2,.,.,G^  are  sufficient  in  order  to 
estimate  the  factor  score  weight. 

Also,  all  the  expressions  in  the  following  sections  which  contain  the  term 
N  V  must  be  corrected  in  the  similar  manner  as  (4. 6. 2. 5)  as  well  as  L^,t. 


Quasi  Marginalization  with  respect  to  the  Factor  Loadings 


inversion  formula  1  in  the  Appendix.  (See  Technical  Notes  section  below.)  As 


noted  before  in  the  exasg>le  of  the  EM  algorithm,  the  MLE  by  the  EM  algorithm 

does  not  produce  negative  estimates  of  the  error  variances  since  the  V 

matrix  is  positive  semi  definite. 

•  e  e  e 

Also,  since  F  always  appears  as  Y'F  and  F  'F  it  is  not 

necessary  to  store  all  the  factor  score  estimates.  That  is, 

(4. 6. 2. 2)  Y'F*  *  Y'Y  D_1  A  V*. 
and 

(4. 6. 2. 3)  F*'F*  =  V*  A'  D_1  Y'Y  D_1  A  V*. 

* 

are  to  be  stored  in  the  calculation.  With  those  quantities,  RSS^  in 

(4. 6. 3.1)  can  be  reexpressed  as 

(4. 6. 2. 4)  RSS*  =  [Y*YJ  -2[Y*F*]  '»  '[F*^*]^. 

In  this  sense  the  mean  corrected  sum  of  squares  and  cross  product  matrix  Y'Y 
sufficient  for  the  estimation.  (  It  is  not  sufficient  for  the  estimation  of 

-1  * 

the  factor  score,  but  sufficient  for  the  scoring  weight  W  *=  D  AV  .) 

When  the  locally  exchangeable  prior  is  used  for  the  factor  scores 

(4. 6. 2.1)  should  be  replaced  by 

(4. 6. 2. 5)  Ep(  A.  D.  H  I  S,  Y  ) 

-  W  *  “<  *’d'1a  1"'  1  > 

+  +  Lp  +  l^j  +  const., 

where 

hv  >■ 


is 
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(4. 6. 1.3)  F*  -  (  l^’C^  4  Y^'1*  )  V\. 

When  some  of  the  elements  of  D  ere  zero  the  calculation  of  P  should  be 
performed  using  Lawley's  trick  in  the  Appendix. 


Conditional  Expectation  of  Minus  Twice  the  Log  Posterior  Density 
From  (4. 6. 1.2)  and  (4.3.2)  we  have 
(4.6.2. 1)  Ep(  A.  D.  H  I.  S.  Y  >  -  /  L  f(  F  I  A.  D.  H,  S.  Y  )  d  F 

•  L-i[  w  *  **i. 1  +  N  «A,D'l*v* 

+  N  tr  c£*V  +  N  lnlCpI  +  LA  +  +  const, 

where 

W  -  <  >• 

and 

hi-  *  C'hX- 

*  Ljj^D*  +  N  tr[A'D  XAV  ]  +  +  LA  +  +  const, 

where 

W  -  -  "**'  >  »“  < 1  -  F’A'  >'• 

H"  ■  )i-il  hi- 1  * N  “  SV  * N  1"|CFI 
Note  that  if  we  minimize  this  expectation  with  respect  to  A  and  D  with 
uniform  prior  of  A  and  D,  we  have  the  same  estimate  as  given  in  Rubin  and 
Thayer  (1982).  This  can  be  shown  by  using  Lawley's  trick  and  the  matrix 


(4.6.1.2)  f4  I  A,  D.  H,  S  :  f£,  V  ) 


and  these  conditional  distributions  are  independent  for  i*l,2,...,N. 

The  joint  density,  which  is  the  product  of  the  normal  density  given  above 
is  denoted  by  f(  F  I  A,  D,  H  ,  S,  Y  ). 

Ve  also  write 

(4. 6. 1.3)  F*  -  Y  W, 
where  W  =  D_1AV*. 

Then  the  locally  exchangeable  prior  is  used  for  the  factor  scores, 

(4.6. 1.2)  and  (4. 6. 1.3)  should  be  replaced  by 

(4.6. 1.4)  f *  =  V*  (  A'D_1yi  +  C{£fk  ), 

if  the  observation  i  belongs  to  the  k^group, 

where 

V*  -  (  A-D'1*  ♦  <£  I’1. 

e  * 

Collectively,  we  also  denote  those  f^’s  in  (4. 6. 1.4)  as  F  .  If  the 
observations  are  arranged  such  that  the  first  n^  observations  belong  to  the 
group  1,  the  second  observations  belong  to  the  group  2,  and  so  on,  we 
write 

y  *  I  v  *  v  •  Y  '  1  * 

1  1  *  2  *  '*•'  GF  J  ' 

and 

F- t  V'  V . V  1'« 

e 

The  same  partition  should  be  used  for  F  so  that  we  may  write 


as  well  ss  the  estimation  procedure  for  the  conditional  mode  of  a  subset  of  the 


paraaMters.  Unless  noted,  the  use  of  a  globally  exchangeable  prior 
distribution  for  both  the  factor  scores  and  the  factor  loadings  is  assumed. 


Conditional  Distribution  of  F  Given  A,  D,  H.  S,  and  Y 
By  collecting  and  from  (4.2.14)  and  (4.3.4),  respectively, 

and  using  the  projection  operator  trick  2  and  the  sum  of  squares  trick  in  the 
Appendix,  we  have 
(4.6.1.1)  L^pi  ♦  Lpi 

*  +(f.-f!)'A'D"1A(f.-ft> 

*i  -i  -i  -i  -i 

♦  *  1“|CFI- 

e  *-i  • 

*  (fj-f  )'V  (ft-f  ) 

+  the  term  not  containing  f., 
where 

P  -  I  -  d"1a(a,d~1a>"1a,d_1, 

P 

f*  -  (A'D^Al^A'D-1^, 

•  -1  -1  -1 
V  ■  (  A'D  *A  +  Cp,1  )  \ 

and 

•  •  -1 

*i  -  V  A'D 

Therefore,  we  have 


where  the  factor  scores,  the  error  variances  and  the  factor 


loadings  are  integrated  ont  in  respective  order  in  the  E-atep. 

Vaf 

where  the  factor  scores  and  the  factor  loadings  are  integrated 
out  in  respective  order  in  the  E-step  and  the  error  variances 
are  integrated  out  analytically  after  the  E-step, 

vw 

where  the  factor  loadings  and  the  factor  scores  are  integrated 
out  in  respective  order  in  the  E-step  and  the  error  variances 
are  integrated  oat  analytically  after  the  E-atep, 

where  the  factor  scores  are  integrated  oat  in  the  E-atep 
and  the  factor  loadings  and  the  error  variances  are  integrated 
oat  analytically  in  respective  order  after  the  E-atep. 


Point  Estimation 

In  this  section  the  E-step  is  first  described.  First,  the  derivation  of 

*W*  eadf'  Vaf*  aad  W  where  the  fi0tor  ,oore*  *re  fir,t 

marginalised,  is  presented.  The  derivation  of  *nd 


EpEp^»  where  the 


factor  loadings  are  first  aiarginalised,  follows.  Then,  the  description  of  the 
M-step,  where  the  mode  of  the  hyperparaawters  are  to  be  estimated,  is  presented 
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integrate  P2  analytically  denoting  the  result  by  f  (  H  |  S,  Y  ). 

Also,  denote  its  log  by  H  I  S,  Y  ). 

The  11-step 

Maximize  Ap^E^(  H  I  S,  Y  )  with  respeot  to  H. 

For  the  next  iteration,  P2  should  be  replaced  by  the  mode  or  the  mean  of 
f*(P2.H|S.Y). 

Since  this  variation  is  different  from  the  one  proposed  before  the 
agreement  of  the  results  should  also  support  the  convergence  of  those  quasi 
marginalization  schemes. 

In  the  application  to  factor  analysis,  we  partition  the  parameter  P  into 
three  subsets,  namely, 

P  *  [  F,  A,  D  ], 

Therefore,  we  have  many  ways  to  perform  the  quasi  marginalization  by  changing 
the  order  of  integration  and  the  way  the  parameters  are  integrated  out  (  by 
analytical  method  or  taking  expectation.)  In  the  following  section,  only  six 
of  them  are  to  be  considered.  Each  of  these  variations  are  designated  by  the 
particular  form  which  is  to  be  maximized  in  the  M-step.  Namely, 

EDAF 

where  the  factor  scores,  the  factor  loadings  and  the  error 
variances  are  integrated  out  in  respective  order  in  the  B-step, 

■»A 

where  the  factor  loadings,  the  factor  scores  and  the  error 
variances  are  integrated  out  in  respective  order  in  the  E-step, 


» ■.t  t7 


■-  ■.  '.V \”  v."  *7 
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the  expectation.  When  the  E-step  is  applied  in  the  next  iteration,  the  first 
part  of  Ep^pj  should  be  retarded  as  the  coaq>lete  lot  likelihood.  That  is, 

when  calculating  the  conditional  distribution  of  PI  given  P2,  H,  S  and  Y,  the 

values  of  the  function  of  P2  should  be  used  in  place  of  P2. 

The  too  expectations,  and 

inclusion  of  P2  in  PI  is  neglected  in  the  latter.  Also,  note 

Ehl2  ’  ^1  bttt  ^Pl  °  hw  where  EP1P2  denote*  th#  8“*  q“8i 
marginalisation  scheme  with  P2  integrated  out  first.  That  is,  the  conditional 
expectation  is  affected  by  the  order  of  integration.  Although  we  cannot  prove 
the  convergence  of  this  process  to  the  desired  marginal  mode,  to  see  that  the 
two  different  methods,  namely,  the  maximization  of  E^^  and  the  maximization 

of  Ep1p2«  result  la  the  same  solution  should  support  the  convergence.  If 

they  do  converge  to  the  same  solution  we  stay  as  well  conclude  that  the  desired 
marginal  mode  of  H  is  obtained  by  this  variation  of  the  original  EH  algorithm. 

The  quasi  marginalization  using  the  EM  algorithm  was  originally  suggested 
by  Tom  Leonard  (personal  communication)  in  the  following  form. 

The  E-step 

Identify  the  conditional  distribution  of  PI  given  P2,  H,  S  and  Y 

* 

and  denote  the  parameter  by  PI  *  hi (  P2,  H,  S,  Y  ). 

Evaluate  the  conditional  expectation  of  log  complete  likelihood 
with  respect  to  PI  and  denote  the  result  by  E^(  P2,  H  |  S,  Y  ). 

Consider  Epj(P2,HlS,  Y)  as  the  log  likelihood  of  the  posterior 

e 

joint  density  f  (P2.HIS,  Y)  with  PI  integrated  out  and 


E^p^,  are  different  since  the 


A  -'.'-l. 


»i 


fl 

.•'1 

.•'1 
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-  Expectation  with  respect  to  PI  of  Expectation  with  respect  to  P2  I  PI. 
However,  when  the  second  expectation  is  taken,  it  smy  be  the  case  that  it  is 

diffienlt  to  do  so  with  PI*  treated  as  a  function  of  P2,  bnt  that  it  is 

tractable  with  PI  regarded  as  constant.  If  this  were  the  case,  we  propose 
the  following  as  a  variation  of  the  original  EM  algorithm. 

The  E-step 

Identify  the  conditional  distribution  of  PI  given  P2,  H,  S  and  Y 

and  denote  the  paraaieter  by  PI  ■  hl(  P2,  H,  S,  Y  ). 

Evaluate  the  conditional  expectation  of  the  log  complete  data  likelihood 
with  respect  to  PI  and  denote  the  result  by  E^(  P2.  H  I  S,  Y  ). 

Consider  E^1(P2,H|s,  Y)  as  the  log  likelihood  of  the  posterior 

joint  density  f*(P2,HlS.  Y)  with  PI  integrated  out  and 
identify  the  conditional  distribution  of  P2  given  H,  S  and  Y  and 

•  • 

denote  the  parameter  by  P2  “h2  (H,S,Y). 

Evaluate  the  conditional  expectation  of  Epj(  P2,  H  I  S,  Y  ) 

* 

with  respect  to  P2  given  S  and  Y  treating  PI  in  the  expression  as 
constant  and  denote  the  result  by  H  I  S,  Y  ). 

The  M-step 

Maximize  Ep^^(  H  |  S,  Y  )  with  respect  to  H. 

It  is  often  the  case  that  consists  of  two  parts,  namely,  the  part 

which  corresponds  to  the  original  eoa^lete  log  likelihood  with  PI  and  P2 

*  • 

replaced  by  functions  of  PI  and  P2  ,  and  the  additional  term  doe  to  taking 


f<  P  I  B  )  »  f<  PI  I  HI  )  f(  P2  I  R2  ). 

When  the  posterior  marginal  mode  of 
j(  H  t  S,  I)  -  /  £(  P,  H  I  S,  Y  )  d  P 

-  /  /  f(  P.  H  I  S.  y  )  d  PI  dP2, 

•re  to  be  estimated  we  may  proceed  as  follows. 

The  E-step. 

Identify  the  conditional  distribution  of  PI  given  P2,  H.  S  and  Y 

and  denote  the  paraamter  by  PI  *  hl(  P2.  H,  S,  Y  ). 

Identify  the  conditional  distribution  of  P2  given  H.  S  and  Y  and 

denote  the  paraamter  by  P2  ■  h2(  H,  S,  Y  ). 

Evaluate  the  conditional  expectation  of  the  log  complete  data  likelihood 
with  respect  to  PI  given  P2,  H.  S  and  Y  and  denote  the  result  by 
E^t  P2.  H  |  S.  Y  ). 

Evaluate  the  conditional  expectation  of  E..(  P2,  H  I  S,  Y  ) 


with  respect  to  P2  given  H,  S  and  Y  noticing  that  PI  in  the 
expression  is  a  function  of  P2  and  denote  the  result  by  E^^l  H  I  S,  Y  ) . 

The  M-step 

Maximize  H  I  S,  Y  )  with  respect  to  H 

e 

treating  P2  as  constant. 

This  is  an  authentic  application  of  the  EM  algorithm  in  which  the  expectation 
is  taken  successively,  i.e.. 

Expectation  with  respect  to  PI  and  P2 


Expectation  with  respect  to  P2  of  Expectation  with  respect  to  PI  I  P2 


63 

is  that  they  cannot  be  negative.  This  can  be  shown  by  noticing  that  RSS’s  are 

* 

always  non  negative  and  V  is  positive  seal  definite.  Therefore,  the 
solution  by  the  EM  algorithm  may  be  different  from  the  one  by  the  direct 
aaxiaization  method  such  as  the  one  enployed  in  SAS  (SAS  User's  Guide: 
Statistics,  1982  Edition.)  Bow  the  algorithm  behaves  in  the  neighbourhood  of 
zero  is  not  known. 

Also,  as  many  Bayesian  statisticians  acknowledge,  the  distributional 
assunq^tion  on  [  a,  b  ]  stay  be  thought  of  as  the  prior  distribution  of  those 

effect  parameters.  (  Box  (1980),  Lindley  (1971)  Box  and  Tiao  (1973),  and 
Lindley  and  Smith  (1972).)  If  we  accept  their  view,  then,  what  we  have  done 
may  bo  regarded  as  Parametric  Empirical  Bayes  estiamtion  of  the 
hyperparameters,  where  the  hyperparameters,  m,  s^,  and  Sg  are  estimated  on 

the  basis  of  data  y.  (  See  Morris  (1983).  )  Also,  from  the  hierarchical  Bayes 

point  of  view,  the  solution  can  be  regarded  as  the  marginal  posterior  mode  of 
the  hyperparameters  when  the  hyperparameters,  m  and  C,  have  uniform  second 

stage  prior  distributions. 


Quasi  Marginalization  by  Some  Variations  of  the  EM  Algorithm 
Consider  the  hierarchical  mode 1  stated  in  the  previous  section,  that  is, 
f(  P  ,  e  I  Y  ,  S  )  “  f(  Y  I  P  )  f(  P  I  H  )  f(  B  I  S  ), 


and  suppose  the  partition  of  the  form 
P  =  (  PI,  P2  ],  H  *  [  HI,  H2  ], 
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+  p  lnis^)  +  4  ln(Sg)  +  “»!)/  «A 

*  <!>V>  /  «B  +  vu  >  '  *A 

*  vu  i,‘b‘ 

The  M-step 

By  differentiating  Epfa.s^Sg.SjJjf)  with  respect  to  a,  *A, 

*B  and  Sg,  and  solving  the  result ing  normal  equations,  we  have, 

m+  *  L=i(  *1 } ' p* 

s*  »  (  RSSy  +  trX'XV  )  /  pq, 

«I  •  <  bssa  *  lih1  vii  >  >  '  - 

4  ■  <  ““s  *  v  >  > ' 

where 

RSS^  -  (  2  -  Xg*  )'(  i  -  Xg*  ), 

rssa  ■  L-l<  <•'  -  »>2  >• 

RSS  =  )  P+J  (  b*2  ). 

B  4i=p+l  l 

The  successive  application  of  these  two  steps  will  result  in  the  marginal 
MLE  of  Sg,  ty  Sg,  and  a,  with  g  integrated  out,  namely,  s^, 

at.  s,  .  and  m+,  at  the  convergence  point. 

A  i 

One  interesting  characteristic  of  those  MLE's  of  the  component  variances 
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“*r  ■  (z<j)'F*!j,'<z(j)‘F’!j)’ 


L*.  ■  )j-l‘  L*i.  1  *  *  ‘“''A1' 


where 


L.  m  =  (a  -a) *C  *(a.-a) 
Aj*  -j  -  A  -j  - 


-1  * 
+  trC.  Q  . 
A  j 


When  the  locally  exchangeable  prior  is  used  for  the  factor  loadings 

•  * 


Q  and  a.  should  be  replaced  by 
J  J 

C~* 

‘j  '  '""j"  *  Ak  '-'-j' 


(4.6. 3. 5)  Q*  -  (  (l/d.)F*'F*+c7J+(N/d)V*  f\ 


and 


*  • 


-1 


-j  -  V  U/VF 


,tt 


if  variable  j  belongs  to  the  k  locally  exchangeable  group. 
The  conditional  expectations  should  be 
(4. 6. 3. 6)  Ep(  A,  D,  H  |  S,  Y  )  =  sane  as  (4.6. 3. 4) 

with  replaced  by 


LA*  '  2j=l[  LAj*  lf 
where 

LAj*  =  (-j~-k> *CAk(5j_5k) 


+  tr  +  lnlCAkL 


tb 


if  variable  j  belongs  to  the  k  group. 


Next  ve  consider  the  analytic  marginalization. 

(4. 6. 3. 7)  AAEp(  D,  H  I  S,  Y  )  ■  -2  ln(  /  f*(  A.  D.  H  I  S,  Y  )  d  A) 

=  5  P  l  a+'F*'FV/d. 

tj=l  -j  -j  3 

+  *<j>'^(j)/dj 

-  ‘>T\  i 

-J  j  "j 

+  In  |Q*I  ] 

4*1  J 

+  p  a'C^a  +  p  lnlcj  +  Lj,*  +  +  const.. 

Now,  when  C^1  — >  zero,  this  expression  reduces  to 

(4.6. 3. 8)  AAEp(  D.  H,  I  S,  Y  ) 

■  lhl  1  i(j>'fV,fVV'*(j) 

,,dJ 

-  .%*-v  i 

“j  J  ~J 

-  r  lnlDl  +  +  1^  +  4  +  const, 

V  v  .  •.  •-!  •  , 


■  )A’  lU)'l(J)/dj'5j'°j’!J  1 

-  r  lnlDl  +  Lp,  +  Lj>  +  const. 


Since,  when  CA  =  zero,  we  have 


(4.6. 3. 9)  Q*  ■=  d  A  F*'F*  +  N  V*  >_1. 
J  j 


fij  -  V  (1/Vf*'z<j> > 


=  (  F  'F  +  N  V  )  F 


•  •  *  • 

!j'°j  !J  ■  *jF  ‘£(J)/dj 


*  #  *  •  * 

(  * (F  'F  +N  V  ) )  /dj 


*  •  *  *  •  • 

•  .  'F  'F  •  +Na.'V  a./d.. 
~J  -J  “j  -j  j 


Therefore, 


(4.6.3.10)  X(J)',(J)/^^,; 

-2vQri!j 

•  *-l  • 

*  V°J  *J 

■  1  *<j)'*(J>  •  2  *j'F‘'*<J> 

•  *  •  *  •  «  * 

+  *j  F  'FVNVV*J  »«J 

*  (  RSS**  +  a*'(N  V*)a*  )  /  djt 

Finally,  we  have,  when  - >  zero, 

(4.6.3.11)  AAEp.(  D.  H  |  S.  Y  ) 

=  ).P,[  (  RSS.  +  a.'(N  V  )a.  )  /  d.  ] 

4=1  J  -j  -J  J 

+  -  r  In  I D I  +  1^  +  1^  +  const. 

Comparing  (4. 6. 3. 4)  to  (4.6.3.11)  we  notice  that  in  the  former  expression 
the  additional  term  doe  to  taking  the  expectation  is  added  to  the  summational 
term  and  that  in  the  later  the  multiplicative  factor  to  ln|D|  is  redoced  by  r. 


In  either  case  this  corresponds  to  the  adjustment  for  the  degrees  of  freedom  of 


the  error  variance.  See  (4. 6. 9.1)  and  (4. 6. 9. 2)  in  the  later  section.  That 


is,  in  both  cases,  the  estimate  of  D  will  be  expanded. 

It  should  also  be  noted  that  this  method  of  quasi  marginalization  does  not 
work  when  exchangeable  prior  is  used  for  the  factor  loadings  with  nonzero 


Quasi  Marginalization  with  respect  to  the  Error  Variacnes 
Consider,  again,  that  minus  twice  the  log  posterior  density  of  D  and  H  is 

given  by  E^D.HlS.Y)  or  A.^D.HlS.Y)  and  denote  it  by  f*(D,H|S,Y). 

(The  quasi  marginalization  of  Ep^(D,H|S,Y)  given  in  the  next  section  is  also 

considered  here.)  Sore,  again,  we  may  either  take  the  expectation  or  integrate 
D  out  analytically.  We  first  consider  taking  the  expectation. 

Write  minus  twice  the  log  density  as 

(4.6. 4.1)  -2  In  f*(D.H|S.Y) 

=  'S.Pt[  (  u  +  s  )/d,  +  (N+n+2)  In  d  ] 

Aj=i  j  j  i 

-  p(  n  ln(s/2)  -  2  In  Gamna(n/2)  ) 

+  4?*  +  LA*  +  L&' 


where 

•e  ••  ***•• 

u  »  RSS  +tr(F  ’F  +N  V  )Q  +Na  'V  a., 
j  J  J  J  J 

if  E^p  or  Ep^  in  the  later  section  is  used, 

**  *  *  * 

u.  =  RSS.  +  Na.'V  a.. 


if  A^Ep  is  asod. 


Therefore,  the  conditional  distribution  of  given  □,  S,  and  Y  is 
(4. 6. 4. 2)  di  I  H,  S,  Y  :  X_2(  N*+n  ,  u^+s  ) 


where  N  =  N,  if  E^p  or  Ep,^  is  used,  and 


N  -  N  -  r.  if 


itAKEP 


is  used. 


Since  those  distributions  are  independent  for  j“l,2, . . . »p,  the  joint  density  is 
given  by  the  product  of  the  inverted  chi  square  distributions  given.  Ve  denote 

the  density  by  f  (  dIh.S.Y). 

The  conditional  expectation  is  given  by 
(4. 6. 4. 3)  E^Ills.Y)  or  E^EpdllS.Y) 

-  -2  /  In  f*(D.H|s,Y)  f*(DlH,S,Y)  dD 

=  I  (u.  +  s)/d.  +  (N  +o+2)  v.  ] 

4j“l  J  J  j 

-  p(  n  ln(s/2)  -  2  In  Gaama(n/2)  ) 

+  b*  +  LA*  +  *11  +  const> 


where 


d*  -  1/E(  1/d.  )  =  (u.+s)/(N*+n), 
j  -  J  J 


and 


v  =  E(  In  d.  ) 
J  -  J 


=  ln(  (Uj+s)/2  )  -  psy(  (N  +n)/2  ) 


where  psy(x)  is  the  diganna  function,  i.e.,  psy(x)  *  d  lnGamna(x)  /  dx. 


[Johnson  and  Sots  (1969)1. 

The  tern  should  be  subtracted  from  (4. 6. 4. 3)  if  is  used. 

Next,  consider  the  analytic  integration  of  D.  The  integration  of 
f*(D,HlS,Y)  with  respect  to  D  is  straight  forward.  The  result  is 

(4.6. 4.4)  f(Hls.Y)  -  1^1 

(s/2)“/2/((u  +s))(N*+n)/2Gaaaa((N*+n)/2)/Gaama(n/2)  J. 

J 

However,  instead  of  straight  integration  we  may  approximate  the  result  by 
using  the  log  normal  approximation  used  in  Leonard  (1983).  That  is,  the  form 

of  f*(D,HlS,Y)  suggests  that 

(4. 6. 4. 5)  Oj  I  dj  :  N*.  dj  ), 


dj  ’•  x  (  n,  s  ), 


where  X2(  n,  s  )  denote  the  chi  square  distribution  with 
the  degrees  of  freedom  n  and  the  mean  ns. 

By  approximating  these  distributions  by  the  log  normal  distributions, 
Bartlett  and  Kendall  (1946),  we  have. 


(4.6.4. 6)  In  u.  I  d 
J  J 


N(  In  d  +  In  N*,  2/N*  ), 

J 


In  dj  :  N(  ln(s/n)  +  2/n  ). 

Therefore,  marginally, 

(4. 6. 4.7)  In  Uj  :  N(  ln(s/n)  +  In  N*  ,  2/N*  +  2/n  ). 


*  e-le.1  Ala.*  ^  -  -  «  -  »-  «  -  «  -  ■-*-  «*  V-  -* - 1  -*  -  ‘  -*  -* 


Finally,  the  resalt  is 


(4.6.4. 8)  ApE^mlS.Y)  or  A^EpttlS.Y) 

-  t  (  ln(u  )-(ln(s/n)+lnN*)  )2  ]  /  (  2/N*+2/n  ) 

AJ*1  J 

+  (  +  La<  )  +  const. 

When  the  quasi  marginalization  of  the  error  variance  by  taking  the 
expectation  is  performed  prior  to  the  quasi  marginalization  with  respect  to  the 
factor  loadings  we  have  a  similar  formola  for  w  namely, 

(4.6.4. 9)  E^HlS.Y)  -  same  as  E^HlS.Y) 

ee  e  *  * 

with  u  =  RSS .  +  Na.'V  a  , 

j  1  J  j 

where  RSS  is  defined  in  (4. 6. 3. 4), 

J 

* 

and  for  the  calculation  of  all  the  terms  d^  should  be  used. 

Given  E^dllS.Y).  E^UllS.Y),  ApE^HlS.Y),  or  A^EpUllS.Y) , 

we  consider  those  as  minus  twice  the  log  posterior  density  of  the 
hyperparameters  and  the  mode  of  those  distributions  are  to  be  estimated  in  the 
subsequent  tf-step.  However,  we  first  oonsider  the  case  where  the  factor 
loadings  are  first  marginalized. 


Conditional  Distribution  of  A  Given  F,  D,  H,  S,  and  Y 
In  this  section  the  derivation  of  Epp^  and 

loadings  are  first  marginalized  by  the  EM  algorithm  will  be  presented. 
By  collecting  and  Lp^  from  (4.3.5)  and  (4.2.16),  respectively. 


ApEp^,  where  the  factor 


and  using  the  sene  tricks  ss  before,  we  have 


AV 

JI.V 


(4.6.5.1)  I^Aj  +  LAj  = 

•  e-i  * 

(a-a.)'Q.  (a  -a.)  +  const, 
“j  “J  j  "j  -j 


where 

Q*  -  (  (l/dJF'F  +  C"1)"1. 

J  j  A 

<i/dJ)P-Z(j)  *  c->.  >. 

This  indicates  that  given  F,  D,  H,  S,  and  Y, 


(4.6. 5.2)  a  I  F.  D.  II,  S.  Y  :  N  (  a*  Q*  ). 

J  *  J  J 

Since  these  are  independent  for  jsl,2,...,p,  the  joint  density  is  the  product 
of  these  densities  and  we  denote  it  by  f (A|F,D,H,S,Y) . 

When  the  locally  exchangeable  prior  distribution  is  used  for  the  factor 
loadings,  similar  correction  as  (4. 6. 3. 5)  is  necessary.  That  is,  replace  all 


the  CA's  and  a's  by  and  a^  if  the  variable  j  belongs  to  the  k 


th 


group. 


Conditional  Expectation  of  Minus  IVice  the  Log  Posterior  Density 
We  have 

(4.6. 6.1)  Ea(F.D,H|S.Y)  =■  /  L  f (A|F,D,H,S,Y)  d  A 

■  <  Bss]  *  »  F'roj  Wv  LAj*  1 

+  p  In |CA I  +  Lj,  +  Ljj  +  Lg  +  const.. 


where 


I r 


V  -  <!j-:>'cA  <!J-i)«tC*0J 


Ve  write 

(4. 6. 6. 2)  La#  =  LAj<  ]  +  p  lnlcj. 

When  tbe  locally  exchangeable  prior  is  used  similar  correction  as 
(4. 6. 3. 6)  is  necessary. 


Quasi  Marginalization  with  respect  to  the  Factor  Scores 
Now  consider  E^F.D.HlS.Y)  as  minns  twice  the  log  of  the  posterior 

density  of  F,D,H  and  denote  it  by  f  (F.D.IllS.Y).  Then,  by  writing 
observation  wise,  we  have 

(4.6.7. 1)  Ea(F.D.H|S.Y) 

"L=i[  <xrAV'D’1(*rAV 

+  f  '(  Cp1  +  Q./d.  )f.  1 
+  La<  +  1^  +  1^  +  const., 
where 

-  2j»i'  Vdj  i- 

Again,  using  the  same  tricks,  we  have 

(4. 6. 7. 2)  f  I  D.H.S.Y  :  Nr(  f*,  V*  ) 


where 

*  •  •  _i  _i 

V  *  (  A  ’D  *A  +  Q./d.  +  £  )  , 


I 


-i  *i* 

Those  condi tonal  distributions  ere  independent  for  i*l,2, .. . ,N,  the  joint 

density  is  given  by  the  product  of  those  and  we  denote  it  by  f*(FlD,H,S.Y). 
Also,  we  write, 

(4.6.7. 3)  F*  -  Y  D_1  A*  V*. 

When  the  loeally  exchangeable  prior  is  used  for  the  factor  scores  a 
similar  correction  as  (4. 6. 1.4)  is  necessary. 

The  conditional  expectation  of  E^(F,D,H}S,Y)  is 

(4. 6. 7. 4)  E^a<D,H|S.Y)  -  /  Ea(F,D,H|S,Y)  MFiD.H.S.Y)  dF 

V  N  e  •  -i  *  * 

-2A1  (zrA*i),D 

+  f*‘(  Q./d.  +  C^1  )f*  ] 

+  N  tr(A*‘D"1A*+Q./d.-K^1)V* 

+  lf*  +  la*  +  1d  +  lb  +  COMt- 

*  )*[  (  RSS  +  tr(F  'F  +N  V  )Q. 

4J«1  j  j 

*  n  .v.;  >/4j  i 

+  Lp^  +  La#  +  +  Lp  +  const,  not  including  D, 

.h.»  rss7  - 

and 

'  L-11  *>•«•<># 


y) 


r 


When  the  locally  exchangeable  prior  ia  need  for  the  factor  scores  a 
siailar  correction  as  (4.6.2. 5)  is  necessary.  That  is,  replace  all  the 

occurrences  of  N  tines  V  by  ^v^[npvVv]  and  modify  Lp,  to 

include  f^. 

It  should  be  noted  that  the  final  expression  in  (4. 6. 7. 4),  though  it  is 
the  sane  as  the  one  in  (4. 6. 3. 4),  is  a  different  one  since  the  definition  of 

e  e 

F  and  A  in  them  are  different.  With  this  difference  in  mind,  we  can 
perform  the  same  quasi  marginalization  with  respect  to  the  error  variance  as 
before. 

Estimation  of  the  Hyperparameters:  the  M-step 
Here,  the  minimization  of  minus  twice  the  conditional  expectation  of  the 
posterior  density  of  H  given  S  and  Y  is  considered.  The  expectations  are  given 
in  (4. 6. 4. 3),  (4.6.4. 8)  and  (4. 6. 4. 9).  It  is  now  necessary  to  specify  the  form 
of  f(H|S).  When  some  information  is  available  we  may  use  the  conjugate  form 
suggested  in  Lindley  and  Snith  (1972)  or  Lindley  (1971).  When  there  are  little 
information  we  can  use  uniform  prior  distribution  for  H.  Although  the  use  of 
uniform  prior  distribution  in  the  case  where  many  parameters  are  to  be 
estismted  is  subject  to  criticism,  we  can  argue  that  it  does  not  hurt  the 
estimation  since  the  number  of  random  variables  which  have  uniform  prior 
distribution  is  greatly  reduced,  Lindley  (1975).  Since  the  informative 
specification  of  the  prior  distribution  of  the  hyperparameters  is  usually 


difficult,  throughout  this  section  uniform  prior  it  used  for  the 
hype rparame te r s . 

First,  we  minimize  with  respect  to  the  hyperparamters  of  the  factor 

loadings,  H, .  Since  the  term  which  includes  H.  is  the  same  for  all  the 
A  A 

cases,  we  have 

(4.6. 8.1)  a+  *  a*  ]  /  p. 

and 

+  *  *  v  n  e 

C  «  (  A  'JA  +  )  P  [  Q.  ]  )  /  p. 

A  4.j=l  j 

where  J  ■  I  -  (l/p)l  1  '. 

P  “P-P 

When  the  globally  exchangeable  prior  is  used  for  the  factor  scores  with 
f«0  and  (^,=*1^,  those  mode  are  not  unique  in  the  sense  that  an  orthogonal 

rotation  of  A  by  an  orthonormal  matrix,  say,  T,  results  in  a  different  mode  T'a 

and  T'C^T,  which  also  gives  the  minimum  of  those  expectations.  Therefore, 

the  off  diagonal  elements  of  should  be  set  equal  to  zero. 

For  the  hyperparameters  of  the  error  varinaces,  H^,  when  the  expectation 

method  is  used,  we  have  the  following  derivatives. 

(4.6. 8.2) 

dE/dn  =  Vj  J  -  p(  ln(s/2)  -  psy(n/2)  ), 

dE/ds  »  1/dj  1  -  pn/ s, 

d2E/dn2  =  (1/2)  p  psy'(  n/2  ), 
d2E/dnds  “  -  p/s. 
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d^E/ds2  ■  pn/s2. 

where  psy'(x)  is  the  triganna  function,  Johnson  and  Kotz  (1967). 

With  these  derivatives  we  can  solve  for  n+  and  s+  by  the  Newton-Eaphson 
method.  Since  there  are  only  two  paraaieters  involved  the  process  does  not 
require  such  iterations. 

When  the  log  normal  approximation  is  used  the  solution  becomes  much  more 
simple.  That  is, 

<4.6. 8.3)  n+  =  l/(  (1/2)]^ (In  «  -u.)2J/p-l/N*  ) 

s+  =  Exp[  ^.^[Ujl/p  -  In  N  ]  x  n+ 

where  u.  *  VP,  [  In  u.  ]  /  p. 
dj=l  j 

When  the  locally  exchangeable  prior  for  the  factor  loadings  is  used 
(4.6. 8.1)  should  be  replaced  by 

(4.6. 8.4)  a*  -  ^Ujl/n^. 

and 

CAk  “  \  Jk\+Ij*ktQj1/nAk 

e  e 

where  the  saaie  partition  as  P  before  is  assumed  for  A  and  is  the 
n^  x  n^  centering  operator.  When  a  priori  zeros  are  specified  for  some 
of  the  elements  of  a^'s  those  must  be  set  equal  to  zero  in  (4. 6. 8.4).  This 
does  not  affect  the  estimation  of  C^'s  since  this  type  of  the  restrictions 


s.' 

O 


■/ 

s' 


■r-  . 


of  the  location  paraaieters  can  be  handled  independently  of  the  scale 
parameteres,  hard is.  et.al.  (1979). 


9 


When  the  locally  exchangeable  prior  for  the  factor  scores  are  used  we  have 
to  estimate  the  hype rparame ter s.  Differentiation  of  with  correction  in 

(4. 6. 2. 5)  gives 

(4.6.5.5)  £  - 


4  -  *  v 

where  is  the  n^  x  n^  centering  operator. 

To  enforce  the  orthogonality  of  the  model  at  least  one  of  the  Cp^ 
matrices  should  be  set  equal  to  1^.  Also,  to  avoid  the  rotational 
indeterminacy,  one  of  the  C^'s  should  be  constrained  to  a  diagonal  swtrix. 


Marginal  Mode  of  the  Error  Variances  Given  the  Hype rparame tors 
Conditioned  on  the  hype rparame ter s,  the  marginal  mode  of  the  error 
variances  can  be  calculated  by  a  variation  of  the  EM  algorithm.  That  is, 
depending  on  the  method  of  quasi  marginalization  of  the  factor  scores  and  the 
factor  loadings,  the  mode  of  the  error  variances  is  given  as  follows.  When 
EAF  or  *Va  is  u,ed'  (tom  (4. 6. 3. 4)  and  (4. 6. 7. 4)  we  see  the  marginal  moo* 


of  the  error  variance  is  given  by 


(4.6. 9.1)  d  *  (  u.  +  s  )  /  (N+n+2),  j-1,2 . . 

J  J 


ee  •  •  eeeee 

u.  -  RSS.  +tr(F  'F  +N  V  )Q  +a.'V  a. 


and  RSS .  ,  F  ,  V  ,  A  ,  and  Q  are  defined, 

J  J 

in  the  corresponding  E-step  formulae,  namely, 

either,  in  (4. 6. 3. 4),  (4. 6. 1.1),  and  (4. 6.3. 2)  or  in  (4. 6. 7. 4), 

(4. 6. 7.2),  and  (4. 6. 5.1)  depending  on  the  choice  of  E^.  or  E^, 

respectively.  When  the  locally  exchangeable  proir  is  used  correspoinding 
corrections  should  be  made. 

When  is  used  the  estimate  becomes,  from  (4.6.3.11), 

(4. 6. 9.2)  dJ=(nj  +  s)/(N  +  n-  r  +  2).  j-1,2, . . .  ,p. 
where  u^  ■  RSS^  +  Sj'(N  V  )Sj. 

* 

As  noted  before,  the  lack  of  the  term  including  Q.'s,  which  comes 

J 

from  taking  the  expectation  with  respect  to  the  factor  loadings,  is  compensated 
by  reducing  the  denominator  in  (4. 6. 9. 2). 

Marginal  Mode  of  the  Factor  Scores  Given  the  Error  Variances 
Conditioned  on  the  error  variances  and  the  hyperparameters  marginal  mode 
of  the  factor  scores  can  be  calculated  by  the  EM  algorithm  where  the  factor 
loadings  are  treated  as  the  missing  data.  The  result  is  given  in  (4. 6. 5.1), 

(4. 6. 6.1),  (4. 6. 7. 2),  and  (4. 6. 7. 3).  Here,  the  first  two  formulae  are 
considered  to  be  the  E-step,  and  the  last  two,  the  M-step.  (  One  to  the 

• 

synsetry  of  the  conditional  density,  F  is  not  only  the  mean  but  also  the 

* 

mode.)  It  should  be  noted  that  this  formula  is  not  linear  in  terms  of  F 
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since  V  include  Q  's  which,  in  torn  include  F  .  When  the  locally 

J 

exchangeable  prior  distributions  are  used  corresponding  corrections  should  be 
made . 


Marginal  Mode  of  the  Factor  Loadings  Given  the  Error  Variances 
Conditioned  on  the  error  variances  and  the  hyperparameters  marginal  mode 
of  the  factor  loadings  can  be  calculated  by  the  EM  algorithm  where  the  factor 
scores  are  treated  as  the  missing  data.  The  result  is  given  in  (4. 6. 1.1-3), 
(4.6. 2.1),  and  (4. 6. 3. 2).  Here,  the  first  four  formulae  are  considered  to  be 
the  B-step,  and  the  last  one,  the  M-step.  (  Due  to  the  synmetry  of  the 

conditional  density.  A*  is  not  only  the  mean  but  also  the  mode.)  When  the 
locally  exchangeable  prior  distributions  are  used  corresponding  corrections 
should  be  made. 


Joint  Mode  of  the  Factor  Scores  and  the  Factor  Loadings 
Given  the  Error  Variances 

Conditional  on  the  error  variances  and  the  hyperparameters,  the  joint  mode 
of  the  factor  scores  and  the  factor  loadings  can  be  given  by  a  straight  forward 
minimization  since  there  is  nothing  left  for  marginalization.  Successive 
application  of 

(4.6.12.1)  F+  *  Y  D"1  A  V*. 

where  V+  -  (  A*D_1A  +  Cp  f1. 


(4.6.12.2)  -  Qj(  (l/djJF'x^j  +  CA‘a  ). 

where  Ql  -  (  (l/d.)F'F  +  c"1  )_1. 

J  J  A 

give*  the  joiot  mode.  Note  that  no  extra  term  doe  to  taking  the  expectation  i$ 
involved.  When  locally  exchangeable  prior  diatribntions  are  need  corresponding 
corrections  should  be  made. 

Sunmary  of  the  Algorithms 

A  summary  of  the  six  methods  props ed  in  the  previons  sections  for  the 
estimation  of  the  modal  value  of  the  hyperparameters  is  found  in  Table  1.  The 
necessary  formulae  are  listed  for  each  step,  namely,  the  B-step  and  the  quasi 
marginalization  and  the  It-step.  The  two  steps  are  iterated  until  the  process 
converges. 

Technical  Notes 

Unless  specified  otherwise  the  following  initial  configuration  is  nsed. 
Initial  Factor  Loadings 

(4.6.14.1)  Principal  Component  Solution  based  on  (1/N)Y'Y  matrix. 

Initial  Error  Variances 

(4.6.14.2)  diag(  (1/N)Y'Y  -  A'A  J,  where  A  is  the  initial  factor  loadings. 
Initial  Factor  Scores 

(4.6.14.3)  Y  D_1  A  (  A'  D_1  A  +  cl1  )_1. 


When  no  a  priori  zeros  are  specified  it  may  be  efficient  to  orthogonally  rotate 
the  initial  configuration  given  above  so  that 

(4.6.14.4)  A'J  A  -  diagonal 

P 

is  satisfied  and  enforce  the  restriction 

(4.6.14.5)  =  diagonal 

in  the  M-step.  When  the  hyperparaneters  have  uniform  prior  distributions  this 
restriction  eliminates  the  rotational  indeterminacy.  When  there  are  a  priori 
zeros,  the  following  orthogonal  rotation  to  the  target  matrix  B  should  be 
performed  on  the  initial  configuration  given  in  (4.6.14.1). 

(4.6.14.6)  B  m  [  b  1.  j  ■  1.  2,  ....  p.  e  *  1.  2,  ....  r. 

J® 

where  b  **  0  if  j  belongs  to  k  and  a^#  *  0, 

=  missing, 

where  k  *  1,  2,  ....  e  *  1,  2,  ....  r,  denotes 

the  locational  hyperparameter  of  the  factor  loadings. 

The  iteration  process  should  be  terminated  when  certain  convergence 
oriterions  is  satisfied.  For  the  programs  where  the  calculation  is  done  with 
double  precision  numbers  the  criterion 

Successive  absolute  difference  of  n  is  less  than  or  equal  to  .001, 
and 

Successive  absolute  difference  of  Ay  j*l,2,...p,  is  less  than 
or  equal  to  .00001, 

is  used.  Since  the  degrees  of  freedom,  n,  has  a  value  comparable  to  the  nuafcer 
of  observations,  the  convergence  criterion  for  n  should  be  adjusted  according 
to  the  number  of  observations.  Also,  the  convergence  criterion  for  the  error 


variances  should  be  adjusted  according  to  the  observed  variance  of  each 


variable. 

When  performing  the  E^,.^  or 

distribution  of  the  hyperparameters  the  conditional  expectation  of  F  tends  to 
shrink  toward  zero.  This  is  due  to  the  unbound edness  of  the  marginal 
likelihood  of  F  and  0  in  (4.4.1).  Therefore,  some  normalisation  such  as 

(4.6.14.7)  F*'F*  -  N  Ir» 

and  is  necessary. 

Also,  due  to  the  unboundedness  of  (4.4.3)  with  uniform  prior  on  s,  algorithms 
such  as  of  ^FAD  are  i*?*0**^*'  The  essential  difference  between 

those  two  cases  is  that  in  the  latter  case,  where  D  is  marginalized  first,  the 
conditional  expectation  of  1/d^  or  is  a  function  of  RSSj  only  and  can 

easily  becoam  zero. 

When  no  a  priori  zeros  are  specified  we  may  be  able  to  rotate  the  solution 
to  a  simple  structure.  Ibis  is  possible  because  we  have  uniform  prior 
distributions  for  the  hyperparameters.  It  should  be  noted,  however,  that  the 
rotation  must  be  performed  not  only  on  the  parameter  matrices  but  also  on  the 
hyperparameters.  That  is,  denoting  the  rotated  matrices  by  #,  the  result  of 
the  rotation  defined  in  (2.1.14)  is 

(4.6.14.8)  F*  -  F  T, 

A*  *=  A  (T')"1. 

aj  -  T*1^,  k  -  1,  2 . GA, 

-  T-1C..(T')-1,  k  -  1.  2 . 


*  * 

and  corresponding  rescaling  of  V  ,  C^,  A  ,  a. 


AnEc.  method  with  a  uniform  prior 


It  should  bo  noted  that  the  conditional  dispersion  matrices  such  as  V 


and  0^  may  be  interpreted  as  the  lower  bound  of  the  posterior  marginal 
dispersion  matrices  of  f ^  or  a^,  respectively,  in  the  sense  that  the  real 


marginal  dispersion  matrices  are  larger  than  the  conditional  dispersion 
matrices.  That  is.  if  some  of  the  conditional  dispersion  matrices  are  very 
large  it  indicates  that  the  factor  analytic  model  with  the  given  dimensionality 
does  not  fit  the  data. 


e  e 

Also,  the  matrix  (1/N)F  'F  is  the  sample  dispersion  matrix  of  the 

e 

factor  scorea  calculated  from  the  estimate  F  .  Therefore,  specification 
errors  with  respect  to  the  number  of  dimensions  may  be  checked  by  the  diagonal 
eleawnts  of  this  matrix.  That  is,  if  some  of  the  sample  variances  are  small, 
it  indicates  that  we  might  have  specified  too  many  factors.  In  this  case  the 
analysis  should  be  perforated  with  a  smaller  number  of  factors. 

Then  some  of  the  error  variances  are  zero  use 

(4.6.14.9)  V*  -  -  CpA'  (D+ACpA*)”1  ACp, 

W  =  D_1AV*  -  (  ACpA'  +  D  )_1  ACp, 

WFV+NvVd^1  I"1. 

•  •  e  -l-l 
and  a.  «  (F  'F  +NV  +d  C.  ) 

”J  J  A 


CHAPTER  V 


EVALUATION  OP  THE  1CIHOD 


Convergence 

In  order  to  check  for  convergence,  the  eolations  using  the  following  nine 
methods  were  coshered. 

1.  M.L.E.  by  SAS 

2.  Marginal  Estimate  of  the  Eror  Variances 

i.e.,  without  marginalization  with  respect  to  D 

3.  Marginal  Estimate  of  the  Eror  Variances, 

i.e.,  E^p  without  marginalization  with  respect  to  D. 

4*  ^DA^F 
5*  ^D^AF 

VW 

1'  edaf 
■•^TA 
9-eauf 

The  three  data  matrices,  namely,  sixteen  psychological  tests  from 
Harman  (1976,  ppl23-124),  ten  artificial  variables  from  Francis  (1983), 

(see  Seber(1984)),  and  five  mathematics  tests  from  Mardis,  at.,  al.  (1979)  are 


analyzed  with  r  *  4,  2,  and  1,  respectively.  The  first  matrix  is  a  correlation 


matrix  and  the  rest  are  dispersion  matrices.  The  elements  of  the  last  matrix 


are  divided  by  1000  in  order  to  keep  the  numbers  in  a  reasonable  range.  For 
the  factor  loadings,  in  order  to  make  the  comparison  possible,  - > 


zero  is  used. 

For  methods  2  and  3,  the  convergence  criterion  is  such  that  the 


and  for  the  last  six  methods,  the  criterion  for  convergence  require  the 
absolute  difference  of  n+  <=  .001. 

The  results  are  shown  in  Figures  1-3,  where  the  mean  and  the  variance  of 
the  estimated  d^'s  are  also  shown.  We  can  conclude  from  these  Figures  that 

six  variations  of  the  EM  algorithm  are  almost  similar.  Therefore,  the  quasi 
marginalization  may  be  regarded  as  a  very  good  approximation  to  the  real 
marg inal izat ion . 

Since  is  easy  to  modify  for  the  calculation  of  MLE  and  the  marginal 

estimate  of  the  error  varinaces  in  the  later  analysis  only  is  used. 


Robustness  to  Initial  Configuration 

In  order  to  evaluate  the  robustness  or  the  sensitivity  to  the  initial 
configuration  the  correlation  matrix  in  Table  2(e)  was  analyzed  with  a 
variation  of  the  Ep^p  method  with  several  different  initial  configurations. 

The  data  matrix  is  calculated  by  Oavis  (1944)  and  used  by  Martin  and  McDonald 
(1975)  and  found  to  result  in  a  Heywood  case  with  zero  error  varinace  for  the 
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first  variable. 

The  variables  are: 

1.  Knowledge  of  word  meanings 

2.  Ability  to  select  the  appropriate  meaning  for  a  word  or  phrase  in  the 
light  of  its  particnlar  contextual  setting 

3.  Ability  to  follow  the  organization  of  a  passage  and  to  identify 
antecedents  and  references  in  it. 

4.  Ability  to  select  the  main  thought  of  a  passage 

5.  Ability  to  answer  questions  that  are  specifically  answered  in 
a  passage 

6.  Ability  to  answer  questions  that  are  answered  in  a  passage  but  not  in 
the  words  in  which  the  question  is  asked 

7.  Ability  to  draw  inferences  from  a  passage  about  its  content 

8.  Ability  to  recognize  the  literary  devices  used  in  a  passage  and  to 
determine  its  tone  and  mood 

9.  Ability  to  determine  a  writer's  purpose,  intent,  and  point  of  view, 
i.e.,  to  draw  inference  about  a  writer 

A  special  version  of  the  program  was  developed  which  uses  a  uniform 

factor  loadings  prior,  a  globally  exchangeable  factor  score  prior,  and  a 
globally  exchangeable  error  variance  prior.  The  degrees  of  freedom  and  the 
scale  parameter  of  the  inverted  chi  square  distributions  are  calculated  with 
the  program  using  the  same  prior  distributions  as  stated  above  for  the 

factor  scores  and  the  factor  loadings.  Since  the  main  interest  here  is  to  see 
the  effect  of  the  globally  exchangeable  prior  distribution  of  the  error 
variances  on  the  analysis  of  a  Ileywood  prone  data  matrix  and  the  comparison 


between  the  Bayesian  solutions  and  the  MLE,  the  marginalizations  of  the  factor 
loadings  and  the  error  variances  are  not  performed.  That  is,  the  program 
calculates  the  posterior  joint  mode  of  the  factor  loadings  and  the  error 
variances  with  the  prior  distributions  specified  above.  The  only  difference 
between  this  special  program  and  the  method  proposed  by  Rubin  and  Thayer  (1982) 
is  that  the  former  uses  the  inverted  chi  sqaare  prior  distributions  as  the 
prior  distribution  of  the  error  variances.  It  is  assumed  that  the  algorithm 
has  converged  when  the  mean  partial  derivative  of  (4.4.2)  with  respect  to  D  is 
less  than  or  equal  to  .00001. 

Eleven  different  initial  configurations  of  the  factor  loadings  and  the 
error  variances  are  calculated  as  follows.  (  The  initial  configuration  for  the 
factor  scores  is  calculated,  given  A  and  D,  by  (4.6.14.3).) 

1.  BLR 

Result  of  with  globally  exchangeable  factor  score  prior  distribution 


and  uniform  factor  loading  and  error  variance  prior  distributions 
without  marginalization  of  the  factor  loadings  and  the  error  variances. 
This  is  the  MLE  by  the  EM  algorithm  proposed  by  Rubin  and  Thayer  (1982). 
2.  BLM 


Result  of  with  globally  exchangeable  factor  score  prior  distribution 


uniform  factor  loading  and  error  variance  prior  distributions 
without  marginalization  of  the  factor  loadings  and  the  error  variances. 
Instead  of  (4.6.14.2)  its  mean  is  used  as  the  initial  estimate  of  all 
the  error  variances. 


This  is  also  the  MLE  by  the  EM  algorithm  with  a  different  initial  estimate. 


Result  of  with  globally  exchangeable  factor  aoore  and  error 

variance  prior  distributions  and  uniform  factor  loading  prior  distribution 
The  hyperparaoeters  n  and  s  are  estimated  here. 

4.  BUM 

Result  of  with  globally  exchangef.jle  factor  score  and  error 

variance  prior  distributions  and  uniform  factor  loading  prior  distribution 
Instead  of  (4.6.14.2)  its  mean  is  used  as  the  initial  of  all 
the  error  variances. 

Although  the  hyperparameters  are  estimated  here  they  are  not  used 
in  the  later  analysis. 

5.  BMR 

Result  of  with  globally  exchangeable  factor  score  prior  distribution 

and  uniform  factor  loading  and  error  variance  prior  distributions 
without  marginalization  of  the  error  variances. 

This  is  the  marginal  MLE  of  the  err*.  .riances. 

6.  . 

Result  of  E^^p  with  globally  exchangeable  factor  score  prior  distribution 

and  uniform  factor  loading  and  error  variance  prior  distributions 
without  marginalization  of  the  error  variances. 

Instead  of  (4.6.14.2)  its  mean  is  used  as  the  initial  of  all  the 
the  error  variances. 

This  is  also  the  marginal  MLE  of  the  error  variances. 

7.  SSMC 

MLE  by  SAS  PROC  FACTOR  with  PRIOR  SMC  option. 
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[  df/3X  ]  =  3(tr[3f/3Y]  'Y)/dX. 

c 

Proof 

3f/3x  =*  V  (  3f/3y  3y  /3x  ) 

pq  Zi.j  'ij  Jij  pq 

*  tr  t  3f/3Y  ] •  [  3Y/3x  ] . 

pq 

=  3(tr[  3f/3Y  ]  'Y)/3x  , 

c  pq 

where  [  3Y/3x  ]  *  [  3y  /3x  ],  a  x  b.  II 

pq  Jij  pq 

When  evaluating  [  3f/3Y  J  treat  Y  as  if  all  of  its  elenents 

are  distinct  even  if  Y  is  symmetric. 

As  a  special  case  of  b“l  and  d*=l  we  have 
3f/3x  »  [  3f/3y  ] *  t  3j/3x  ] , 

where  [  3jr/3x  ]  «  l  dyjdx^  ].  i*l,2,...,a,  j=l,2,...,c,  axe. 

Also,  when  Y  =  x  I,  we  have 

[  3f/3x  ]  »  tr[  3trt3f/3Y]  /3x  ]. 

c 

For  some  specific  forms  of  f  we  have 
dtrAX/dX  -  A',  if  X  is  distinct. 

*  A+A*-diag(A) ,  if  X  is  syonetric. 

3trX'AX/3X  *  (A+A')X,  if  X  is  distinct, 

*  (A+A')X+X(A+A’)-diag[(A+A’)X] ,  if  X  is  symmetric. 
dtrXAX'/dX  =  X(A+A'),  if  X  is  distinct, 

=  X(A+A')+(A+A')X-diagI(A+A')X] ,  if  X  is  symmetric. 
3trX'AXB/3X  =  AXB+A'XB’,  if  X  is  distinct, 

-  AXB+BXA+A'XB'+B’XA'-diag(AXB)-diag(BXA) ,  if  X  is  symmetr 

3X_1A/3X  *  -(X^A'I-1)'.  if  X  is  distinct. 
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[  3f/3Y  ]  -  [  af/3yAj  1.  1-1.2.. ...a.  j-1.2.....b.  a  *  b, 

and.  tbs  derivative  of  f  with  respect  to  X. 

[  df/dX  ]  -  [  af/dx^  1.  i-1.2,. . . ,c,  j-1,2, . . . ,d,  c  x  d. 

Therefore,  the  derivative  with  respect  to  Y'  is 
t  3f/3Y*  ]  -  [  3f/3Y  ]'.  b  x  a. 

Also,  when  Y  is  diagonal, 

I  3f/3Y  ]  -  diag[  3f/3Y  ]. 

Uaing  the  following  properties  of  trace  we  have  two  asefnll  roles  for 
differentiation. 

trOV  -  trVD  -  trV'U'  -  trU'V.  and  trUV  -  ^  j(  nijvji  )• 

1.  Product  Role 

Let  D  and  V  be  functions  of  X.  Then, 

3trUV/3X  -  dtrUV  /3X  +  3trU  V/3X. 

c  c 

where  subscript  c  denotes  that  the  matrix  with  c  is  to  be  held 
constant  for  the  purpose  of  differentiation. 

Proof 

StrUV/dx  -  d).  .(  u. .v  ,  )/3x 

pq  Li,j  1J  ji  pq 

*  ^  (3u,,/3x  v  )  +  ^  (u  3v ,./3x  ) 

L  ij  pq  ji  t  ij  ji  pq 

-  aW  .(v.  .)  )/3x  +  3^((u  )  v ,./3x 

i  ij  ij  c  pq  L  ij  c  ji  pq 

-  dtrUV  /3x  +  3trD  V/3x  .  t! 

c  pq  c  pq 

2.  Chain  Rule 

Let  Y  be  a  function  of  X.  Then, 
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■ 51  > 

■1.,'  w 1 

*  trCE(Y) 

*  trCE(xx') 

-  trC(  V*  +E(x)E(x) '  ) 

*  *  * 

*  trC(  V  +x  x  '  ) 

•  » 

*  trCV  +  trCx  x' 

=  trCV  ♦  x  'Cx  .  II 


Partial  Differentiation  of  Some  Functions  of  Matrices 


Following  rales  and  formulae  are  collected  from  Schoenemann  (1965), 
Rao  (1973),  Press  (1982).  and  Nel  (1980). 

Definition: 

Let  f  =  f  (  Y  ) , 


where  f  is  a  matrix  to  scalar  function, 

Y  *  [  yAj  ].  i-1.2 . a.  j-1.2 . b,  a  x  b. 

1  *  1  *ij  1  *  i=1,2 . °*  c  x  d. 


yij  ”*ij(  *  >' 


where  g^,  1*1,2, ...  ,a,  j*l,2,...,b,  is  a  matrix  to  scalar  function. 


Then,  the  derivative  of  f  with  respect  to  Y  is  defined  as 


D  ACA'D  A)  ‘(Cp  +A'D  A)-D  A 


-  d"1A(A*d’1A)_1cJ1. 

Therefore, 

0_1A(  (A*D”1A)""1+€jf)CjJ1  «  D_1A( A 'D_1A) ~1<^1 . 
Q-1A<  (A'D~^A)~^-K]!p)  -  D^ACA'D^A)”1. 
Q'^d+CpA'D^A)  -  D_1A. 

0_1A  =  D_1A( I+CpA'D-1A)-1 . 

■  d~1a(c^1+a,d”1a)*"1c^1.  II 


Expectation  of  Qnadratlc  Fora 

Let  E(x)  -  i  and  D(x)  *  V  . 

Then  E(  x’Cx  )  »  x*'Cx*  +  trCV*. 

Proof 

x'Cx  *  trCxx ' 

cijya  ’• 

where  Y  -  xx’ . 

Therefore. 


E(x'Cx)  -  E(  trCY  ) 


Let  A,  B,  and  C  be,  respectively,  p  x  p,  p  x  q,  and  q  *  p.  Then 


3.  |  A  +  BC  I  *  |  A_1  I  I  I  +  A_1BC  | 

P 

-  I  A_1  I  I  I  +  CA_1B  I . 

q 


Lawley't  Trick 

Q_1A  -  D-1A  (C^1+A'0"1A)"1  cj\ 
where  2  =  A  Cp  A'  +  D. 

Proof 

By  the  Matrix  Inversion  Foraala  1  we  have 
if1  -  D_1-D_1A(  C^1  +  A'D_1A  )_1A'D_1. 

Multiply  A(A'D~1A)_1(cJ1+A,d"1A)  fro*  the  right.  Then, 

left  -  Q-1A(AD~1A)”1(c£1+A,D~1A) 

-  Q~*A  ( A  *  D~*A)  ~1<^1  +  Q_1A 

-  fl^AaA'D^Al^cJ^I) 

-  q"1A((A'D_1A)"1-»Cf)C^1. 
right  -  d'1A(A'D_1A)_1(C^1+A'D~1A) 

-D_1A(C^1+A,d'1A)''1A,D'1A(A,D_1A)'1(C^1+A'D"1A) 
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San  of  Squares  Congletion  Trick 
Lot  x,  m,  and  b  be  p  x  1,  end  A  and  B  be  p  x  p  syametrix. 
Then, 

q  ■  (x-a) 'A(x-a)  +  (x-b) 'B(x-b) 

■  (x-d) ’D(x-d)  +  c, 

where, 

D  -  A  +  B. 

d  -  D_1(Aa  +  Bb). 
and 

c  -  a'Aa  +  b'Bb  -  d'Dd. 

Proof 

q  -  x ' ( A+B)x-2x ’ ( Aa+Bb ) +a 'Aa+b ' Bb 
-  (x-(A+B)_1 (Aa+Bb))  '(A+B)(x-(A+B)-1 (Aa+Bb >) 
-(Aa+Bb)(A+B)-1(Aa+Bb)+a'Aa+b'Bb.  II 

Matrix  Inver « Ion  and  Determinant  Trick 
Let  A  and  B  be  square.  Then, 

1.  (  A  +  UBV  )_1  *»  A_1  -  A_1U(  B_1  +  VA_1D  )_1VA_1. 

2.  (  A  +  B  )_1  -  A_1  (  A-1  +  B-1  )_1  B_1. 


‘  1 
I 

::3 

A 


A 


Ill 


Bat, 

Y+  -  XB  =  XB+  -  XB  =  X(  B+-B  ), 
and 

U*'(Y+-XB)  -  Y’PX(B+-B)  =  zero, 
•ince  PX  -  zero,  II 


Projection  Operator  Trick  2 

Let  Z  be  n  x  p,  V,  n  x  r,  B,  r  x  p,  and  A,  p  x  p. 
Then, 

Q  -  (  Z  -  VB  ) '  A*1  (Z-VB) 

-  Z*A_1/2PA“1/2Z  +  (  B  -  B+  )'  V*A-1V  (  B  -  B+  ), 
where, 

P  *  I  -  A~1/2V  (V'A-1V)_  V'A_1/2, 
n 

and 

B+  ■  (  V'A~^V  )_V'A-1Z. 

Proof 

Apply  the  trick  1  to  the  transformed  variables 
Y  -  A~1/2Z  and  X  =  A~1/2V.  1 1 


O 

<Vi 


I 


Projection  Operator  Trick  1 


Let  T  be  n  *  p,  X,  n  *  r,  end  B,  r  x  p. 


Q-(Y-XB)'(Y-XB) 


Y'PY  +  (  B  -  B+  ) '  X'X  (  B  -  B+  ) , 


P  =»  I  -  X  (X'X)  X. 
n 


B  -  (X'X)  X'Y, 


A  it  a  generalized  inverse  of  A. 


Proof 


Let  Y+  -  X  B+  -  Xtt'XfX'Y, 


nd  U+  «  Y  -  Y+  -  PY. 


Q  *  (  Y-Y++Y+-XB  )'(  Y-Y++Y+-XB  ) 


(  U+  +  Y+-XB  ) ' (  U+  -  Y+-XB  ) 


U+'U+  +  D+'(Y+-XB)  +  <Y+-XB)'U+  +  (Y+-XB) ’ (Y+-XB) . 


the  later  stage.  As  ve  expect,  the  nodal  values  are  slightly  larger  than  the 
expected  values.  Next,  we  observe  more  shrinkage  toward  the  naan  in  the 
marginal  estimate.  This  can  be  checked  by  the  sample  variances  of  each  set  of 
the  paraawters.  This  is  also  expected  since  the  farther  we  marginalize  the 
smaller  the  conditional  dispersion  matrix  becomes.  For  the  factor  scores 
coaqpare  V's  in  (4.6.12.1)  and  (4. 6. 7. 3)  and  for  the  factor  loadings,  Q’s  in 
(4.6.12.2)  and  (4.6. 3.2). 

As  far  as  this  data  set  is  concerned,  it  can  be  said  that  the  choice  of  a 
particular  set  of  the  estimates  is  purely  subjective. 


Cone Ins ions 

It  is  found  that  six  methods  to  siarginalize  parameters  behave  very 
similarly.  Therefore,  the  use  of  the  simplest  and  the  most  natural  method, 
namely,  the  Ej^p  method,  is  recommended .  It  is  also  found  that  the  method  is 

robust  to  the  choice  of  initial  estiautes.  Finally,  use  of  locally 
exchangeable  prior  distribution  for  the  factor  loadings  is  highly  recoeanended 
when  there  are  some  grouping  infonaation  of  the  variables  prior  to  the 


analysis. 
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3.  PI •(* 

4.  General  Information 

5.  Paragraph  Comprehension 

6.  Sentence  Completion 

7.  Word  Claaaif ication 

8.  Word  Meaning 

9.  Addition 

10.  Code 

11.  Counting  Dots 

12.  Straight-Carved  Capitals 

13.  Word  Recognition 

14.  Number  Recognition 

15.  Object-Number 

16.  Mufcer-Figure 

Since  it  ia  known  that  the  variables  form  four  clusters,  [Shiba  (1979)],  the 
following  four  locally  exchangeable  groups  with  a  priori  zeros  are  uaed. 

Group  1.  Variables  1,  2,  and  3,  with  zeros  in  dimensions  2,  3,  and  4. 

Group  2.  Variables  4,  5,  6,  7,  and  8,  with  zeros  in  dimensions  1,  3,  and  4. 

Group  3.  Variables  9,  10,  11,  and  12,  with  zeros  in  dimensions  1,  2,  and  4. 

Group  4.  Variables  13,  14,  IS,  and  16,  with  zeros  in  dimensions  1,  2,  and  3. 
With  this  prior  specification  the  data  set  was  analyzed  with  the 

method  and  the  results  of  eaoh  stage  are  shown  in  Tables  6(a)  through  6(h). 

We  first  notice  that  the  difference  between  the  first  stage  and  the  second 
stage  is  not  large.  Therefore,  it  may  be  reasonable  to  skip  the  second  stage 
and  use  the  conditional  expectation  of  (reciprocal  of)  the  error  variance  in 


The  estimated  hyperparaawters  are  shown  in  Table  5(g) 


Choice  of  Estimates 

As  noted  before,  the  choice  of  the  estimates  depends  on  the  investigator's 
interest.  If  we  are  interested  in  both  the  factor  scores  and  the  factor 
loadings,  the  joint  node  sonld  be  nsed.  However,  if  we  are  interested  in  one 
of  then  the  marginal  mode  should  be  nsed.  Also,  in  each  stage  of  the 
estimation  procedure,  that  is. 

Stage  1.  Marginal  Mode  of  the  Hyperparaaetera 
Stage  2.  Marginal  Mode  of  the  Error  Variances  Conditional  on 
the  Hyperparaaetera 

Stage  3.  Joint  Mode  of  the  Factor  Scores  and  the  Factor  Loadings 

Conditional  on  the  Error  Variances  and  the  Ilyperparaaeters 
Stage  4.  Marginal  Mode  of  the  Factor  Loadings 

Conditional  on  the  Error  Varianoes  and  the  Hyperparaawters 
Stage  5.  Marginal  Mode  of  the  Factor  Scores 

Conditional  on  the  Error  Variances  and  the  Hyperparaawters 
all  the  paraswter  values  are  available,  if  not  as  the  mode,  as  the  conditional 
expectation.  Therefore,  it  may  be  reasonable  not  to  perform  all  the  analysis 
if  those  parameter  values  are  similar. 

In  order  to  check  this  aspect  of  the  procedure  the  correlation  matrix  in 
Table  2(a)  was  analysed.  The  variables  used  are: 


1.  Visual  Perception 

2.  Paper  Form  Board 


9.  Mechanical  Coaqpre  hens  ion 
10.  Electronics  Information 

The  date  vara  aaalyxad  by  the  method  with  three  differeat  prior 

distribatioas  of  factor  loading*,  namely,  the  looally  exchangeable  prior  with  a 
priori  zeros  in  the  hype rparama ten,  the  looally  exchangeable  prior  with  the 
same  groupings  bat  without  a  priori  zeros,  aad  the  globally  exchangeable  prior. 
The  grouping  of  the  variables  aad  the  looatioaa  of  the  a  priori  zeros  are 
Group  1:  Variables  1,3  and  4,  zeros  in  Dimensions  2.3,  and  4. 

Group  2:  Variables  7,9,  aad  10,  zeros  in  Dimensions  1,3,  end  4. 

Group  3:  Variables  2  and  8,  zeros  in  Dimensions  1,2,  and  4. 

Group  4:  Variables  5  and  6,  zeros  in  Dimnsions  1,2,  and  3. 

These  grouping  and  a  priori  zeros  are  suggested  by  the  analysis  by  See.  et.al. 
(1981).  The  globally  exchangeable  prior  distribution  of  the  error  variance  is 
used  and  the  hyperparaawters,  a  aad  s,  are  estimated  first.  Then,  conditioned 
on  those  values  and  the  final  value  of  D,  the  marginal  node  of  the  factor 
loadings  are  calculated.  The  marginal  mode  of  D  is  not  used  since,  at  least 
with  this  data,  it  is  very  similar  to  the  final  value  of  D. 

The  results  are  shown  in  Tables  5(a)  through  5(d)  with  an  additional 
solution  from  SAS  PROC  FACTOR  MLE  with  PRIOR  SMC  option.  Table  5(e),  and  the 
oblique  solution  in  See,  et.al,  (1981),  Table  5(f).  The  results  in  Tables 
5(b)  through  5(e)  are  rotated  by  the  VARIMAX  method.  By  comparing  Table  5(a) 
and  5(b)  it  is  obvious  that  just  by  specifying  a  priori  grouping  aad  xeros  in 
the  hype rpa raise ters  we  can  attain  sisqile  structure.  Ironically,  in  this  case, 
the  VARIMAX  solution  seeau  to  be  worse  than  the  unrotsted  solution.  The  rest 
of  the  Table  5  also  shows  the  same  pattern  of  simple  structure. 


It  can  be  concluded  that  the  method  is  very  robu  *  to  the  different 


initial  configurations.  Also  this  finding  strongly  demonstrates  the 
superiority  of  the  Bayesian  factor  analysis  proposed  in  the  Chapter  IV  over  the 
usual  MLB.  Five  different  found  by  SAS  PROC  FACTOR  are  reduced  to  two 

different  Bayesian  solutions  just  by  assuming  an  informative  prior  distribution 
of  the  error  variances.  Intuitively,  it  can  be  said  that  the  prior 
distribution  has  the  effect  of  deemphasising  local  minima.  A  similar  result 
may  be  expected  from  the  use  of  the  prior  distributions  of  the  factor  loadings. 


Effect  of  the  Locally  Exchangeable  Prior  Distributions 
of  the  Factor  Loadings 

In  order  to  demonstrate  the  effect  of  the  locally  exchangeable  prior 
distributions  of  the  factor  loadings  the  correlation  matrix  of  ASVAB  Form  8a  in 
Table  2(e)  is  analyzed.  The  data  were  collected  by  Ree,  Mullins,  Mathews,  and 
Massey  (1981)  with  2620  subjects  and  consists  of  scores  on  the  following  ten 
teats. 

1.  General  Science 

2.  Arithmetic  Seasoning 

3.  Word  Knowledge 

4.  Paragraph  Comprehension 

5.  Numerical  Operations  (speeded) 

6.  Coding  Speed  (speeded) 


7.  Auto-Shop  Information 

8.  Mathematics  Knowledge 


MF  by  SAS  PROC  FACTOR  with  PRIOR  uniquenesses  given  by  the  result 
of  BLR. 

9.  SX#2 

MLE  by  SAS  PROC  FACTOR  with  the  following  PRIOR  uniquenesses 
.5  .001  .5  .5  .5  .5  .5  .5  .5 

10.  SX#7 

MJE  by  SAS  PROC  FACTOR  with  the  following  PRIOR  uniquenesses 
.5  .5  .5  .5  .5  .5  .001  .5  .5 

11.  SW9 

MLE  by  SAS  PROC  FACTOR  with  the  following  PRIOR  uniquenesses 
.5  .5  .5  .5  .5  .5  .5  .5  .001 

The  SAS  results  with  the  PRIOR  uniquenesses  given  by  the  results  of  BLN. 
BUR,  BUM,  BMR,  end  EMM  are  the  same  as  SSMC.  The  initial  configurations  are 
shown  in  Table  3. 

Each  initial  configuration  above  was  submitted  to  the  special  program  to 
check  the  robustness.  The  values  n  ■  15.0139  and  s  *=  5.48453  were  used  for  all 
the  calculations.  As  shown  in  Fig.  4  and  Tables  4(a)  through  4(d),  the 
program  resulted  in  only  two  different  solutions.  The  first  group,  where 
variable  4  has  low  error  varianoe,  consists  of  BKBLR,  EKBHR,  BOW,  and  BKSHR, 
where  BK  is  attached  to  indicate  the  results  of  the  special  program.  The 
second  group,  where  variable  4  has  high  error  variance,  consists  of  EKELN, 
BKBMR,  BKBMM,  BKSSSfC,  EKSBLM,  BKSBHR,  BKSDHM.BKSBMR,  BKSBMH,  BKS&P2,  BKSX#7  and 
EKSX#9.  The  solutions  within  each  group  are  identical  to  each  other  to  six 
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-  -2  X_1AX_1  -  di»g(X~1AX~1) ,  if  X  and  A  arc  symmetric. 

3Y_1A/aX  -  -dtr[(Y-1AY_1)  Y]/dX. 

0 

dlnlXl/dX  -  (X-1)',  if  X  ia  distinct, 

»  2X  *-diag(X  1),  if  X  ia  syomtrio. 

31n|Y|/dX  -  dtrl  Y_1Y  J/dX. 

c 


The  Gattaan/Kestelaan  Formnla 

Let  the  following  factor  analytic  model  hold, 
y  =  A£  +  Dz, 

where  jr  it  p  i  1,  f  it  r  x  1,  t  ii  p  z  1, 

A  is  p  z  r,  and  D  is  p  x  p  diagonal, 

E(f)  «=  E(z)  »  0. 

D(f )  -  I  .  D(z)  -  I  , 

-  r -  p 

Cov(f,x)  =  zero. 

These  imply 

E(jr)  ®  0,  and  D(jr)  *  AA'  +  =  ,  say,  0. 

2 

Note  that  the  error  variances  are  denoted  by  D  . 

Ke ate loan (1952)  and  Outtoian(1955)  showed  that,  given  A  and  D, 


the  model  is  not  unique  in  the  sence  that  the  variables 


PP'  *  Ir  -  A'Q  A, 

and  a  is  any  variable  which  satisfies 
E(s)  °  0,  D(s)  “I  ,  and  Cov(  y,s  )  *  zero, 

-  - -  P  ~ 

also  satisfy  the  model  requirement  stated  above 

Proof 

•  *  • 

jr  *  Af  +  Dz 

-  A(A'Q'Vps)  +  D(DQ~1jr*D"*1APs) 

-  AA'Q_1y  +  APs  +  D2Q_1y  -  APs 

-  (AA'+C2)Q-1y 
■  2- 

E(f*)  -  A'Q_1E(Z)  +  PE(s) 

-  0. 

E(z*)  =  DQ_1E(y)  +  d'1APE(s) 

*  0. 

D(f*)  -  A'Q_1  Q  Q_1A  +  P  I  ?' 


)(«  )  -  DQ  Q  0  u  +  D  AP  I  P'A'D 


.  do_1D  +  d"1A(I-A*0~1A)A'D'1 

-  D~1(dW+AA'-AA,q'1AA,)D~1 

=  d”1(D2q"1D2+AA,-<0-D2)q'1AA,)d“ 

-  D-1 ( d2q'1d2 +AA'-AA'+D2Q_1AA'))E 
.  d'1(d2d'1d2+oWd2))d'1 

,=  d‘1(d2q"1d2+o2-dV1d2)d'1 

-  D_1(Q-AA')D_1 

*  I. 

Cov(f  ,  z  )  ”  Cov(A'fl  2y,DQ  *y-D  AP*) 

-  A'0_1  Q  Q_1D  -  PP'A'D-1 

-  A'0-1D  -  (I-A'o'W'D*1 

-  A* (Q~*D-D_*+Q~2AA’D~2) 

-  A’(Q~V-D_l+<f1AA'D~1)  DO 

=  A'dfV-nc^AA')  d'1 
=  A*(Q~1(D2+AA')-I)  d'1 

-  A‘(Q'Vl)  d"1 


Z< 
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Table  1  Summary  of  the  Algorithm 

Prior  Distributions  of  F  and  A 
Globally  Exchangeable  Locally  Exchangeable 

EDAF 


The  E-step 


• 

V 

4.6. 1.1 

4.6. 1.4 

• 

Y'F 

4.6. 2.2 

4. 6.2. 6 

•  * 

F  'F 

4.6.2. 3 

4.6.2. 7 

• 

A 

4. 6. 3. 2 

4.6. 3. 5 

• 

Q 

4.6. 3.2 

4. 6. 3. 5 

e 

D 

4. 6. 4. 3 

The  M-step 

a 

4. 6. 8.1 

4. 6. 8.4 

CA 

4.6. 8.1 

4.6. 8.4 

f 

fixed 

4.6. 8.5 

°F 

fixed 

4.6. 8.5 

n 

4.6. 8.2 

e 

with  n  and  N  in  4. 6. 4.1  and  4. 6. 4. 2 

Sfa 

The  E-step 

e 

A 

4. 6. 5.1 

4. 6.3. 5* 

• 

Q 

4.6. 5.1 

4. 6. 3. 5* 

e 

V 

4.6. 7. 2 

4.6.2. 5* 

* 

Y'F 

4.6. 2.2 

4.6. 2. 6 

*  * 

F  'F 

4. 6. 2. 3 

4. 6. 2. 7 

* 

D 

4. 6. 4. 3 

The  M-step 

a 

4. 6. 8.1 

4. 6. 8.4 

CA 

4. 6. 8.1 

4. 6. 8. 4 

f 

fixed 

4.6. 8.5 

fixed 

4. 6. 8. 5 

n 

4.6. 8.2 

with  u  and  N  in  4. 6. 4.1  and  4. 6. 4. 2 

after  the  fomula  number  indicates  analogous  correction 


Table  1  (continued) 


Prior  Distributions  of  F  and  A 

Globally  Exchangeable 

Locally  Exchangeable 

eadf 

The  E-etep 

• 

V 

4.6. 1.1 

4.6. 1.4 

* 

Y'F 

4. 6. 2. 2 

4. 6. 2. 6 

•  e 

F  *F 

4.6.2. 3 

4.6. 2. 7 

e 

D 

4.6. 4.9 

• 

A 

4.6. 3.2 

4. 6. 3. 5 

e 

Q 

4. 6. 3. 2 

4.6. 3. 5 

The  M-step 

a 

4.6. 8.1 

4.6. 8.4 

CA 

4.6. 8.1 

4.6. 8.4 

f 

fixed 

4.6. 8.5 

fixed 

4.6. 8.5 

n 

4. 6. 8. 2  with  u 

and  N  in  4. 6. 4. 9  and  4. 

Vaf 

The  E-atep 

• 

V 

4.6. 1.1 

4.6. 1.4 

• 

Y'F 

4. 6. 2. 2 

4. 6. 2. 6 

*  • 

F  'F 

4.6.2. 3 

4. 6. 2. 7 

• 

A 

4.6. 3.2 

4. 6. 3. 5 

• 

Q 

4.6. 3. 2 

4. 6. 3. 5 

The  M-etep 

a 

4. 6. 8.1 

CA 

4.6. 8.1 

n 

4. 6. 8.3  with  u  and  N*  in  4. 6. 4.1  and  4 

For  the  next 

iteration  use 

d,  =  (  u  +  a  )  /  (N  +  n  +  2). 
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Table  1  (continued) 


Prior  Distributions  of  F  and  A 

Globally  Exchangeable  Locally  Exchangeable 

Vfa 

The  B-step 

• 

* 

A 

4.6. 5.1 

4. 6. 3. 5 

* 

e 

Q 

4.6. 5.1 

4. 6. 3. 5 

• 

e 

y 

4.6. 7.2 

4. 6.2.5 

* 

Y'F 

4. 6. 2. 2 

4.6. 2. 6 

*  • 

F  'F 

4. 6. 2. 3 

4. 6. 2. 7 

The  M-step 

a 

4.6. 8.1 

4.6. 8.4 

CA 

4. 6.8.1 

4.6. 8.4 

f 

fixed 

4.6. 8.5 

<* 

fixed 

4.6. 8.5 

n 

4. 6. 8. 3  with 

* 

u  and  N  in  4. 6. 4.1  and  4 

For  the  next 

iteration  nse 

V< 

Uj+s)/(N  +  n 

+  2  ). 

The  E-step 

9 

V 

4. 6. 1.1 

4.6. 3. 4 

• 

Y'F 

4. 6. 2. 2 

4. 6. 2. 6 

•  • 

F  'F 

4. 6. 2. 3 

4. 6. 2. 7 

* 

A 

4. 6. 3. 9 

• 

Q 

4. 6. 3. 9 

The  M-step 

f 

f  ixed 

4. 6. 8. 5 

f  ixed 

4. 6. 8. 5 

n 

4. 6. 8. 3  with 

o  and  N  in  4. 6. 4.1  and  4. 

For  the  next 

iteration  nse 

d.  =  ( 

n  +s)/(N  +  n 

-  r  +  2  ). 

J 

j 
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Table  2  Data  Matrices 


(a)  Correlation  Matrix  from  Ilarman  (1976) 


1.000 

n  -  : 

145 

.403 

1.000 

.468 

.305 

1.000 

.321 

.247 

.227 

1.000 

.335 

.268 

.327 

.622 

1.000 

.304 

.223 

.335 

.656 

.722 

1.000 

.332 

.382 

.391 

.578 

.527 

.619 

1.000 

.326 

.184 

.325 

.723 

.714 

.685 

.532 

1.000 

.116 

.075 

.099 

.311 

.203 

.246 

.285 

.170 

1.000 

.308 

.091 

.110 

.344 

.353 

.232 

.300 

.280 

.484 

1.000 

.314 

.140 

.160 

.215 

.095 

.181 

.271 

.113 

.585 

.428 

1.000 

.489 

.321 

.327 

.344 

.309 

.345 

.395 

.280 

.408 

.535 

.512 

1.000 

.125 

.177 

.066 

.280 

.292 

.236 

.252 

.260 

.172 

.350 

.131 

.195 

1.000 

.238 

.065 

.127 

.229 

.251 

.172 

.175 

.248 

.154 

.240 

.173 

.139 

.370 

1.000 

.176 

.177 

.187 

.208 

.273 

.228 

.255 

.274 

.289 

.362 

.278 

.194 

.341 

.345 

1.000 

.368 

.211 

.251 

.263 

.167 

.159 

.250 

.208 

.317 

.350 

.349 

.323 

.201 

.334 

.448 

1.000 

(b) 

Dispersion  Matrix  from  Francis  (1983). 

10.934 
8.104  10.709 
10.468  8.623 
8.541  10.155 
11.998  10.494 
-0.047  0.025 
2.385  2.765 
-0.626  -0.166 
-0.168  1.990 


N  -  50 


14.528 

9.629 

12.625 

0.901 

3.480 

0.867 

1.433 


14.846 

11.063  19.832 
1.892  -0.092  15.804 
3.880  2.915  12.921  17.580 

1.466  0.684  15.001  15.426  24.436 

3.292  0.323  13.498  15.365  16.602  21.326 

-1,749  -1.139  -0.453  1.873  -2.958  12.491  13.367  15.814  15.385  22.136 


->1 

-'1 


‘--1 

'-1 


(c)  Dispersion  Matrix  from  Nardia,  et.al.  (1979). 


.3023 

.1258  .1709 
.1004  .0842  .1116 
.1051  .0936  .1108  .2179 
.1161  .0979  .1205  .1538  .2944 


N  -  100 


'1 


All  the  entries  are  divided  by  1000. 


} 


Table  2  (continued) 

(d)  Correlation  Matrix  of  ASVAB  Fond  8a. 

.00 


.71 

1.00 

.83 

.70 

1.00 

.74 

.70 

.82 

1.00 

.48 

.59 

.52 

.55 

1.00 

.43 

.52 

.48 

.49 

.64 

1.00 

.70 

.60 

.68 

.63 

.40 

.42  1.00 

.65 

.79 

.62 

.60 

.58 

.51  .52  1.00 

.71 

.69 

.67 

.64 

.45 

.45  .75  .64  1.00 

.78 

.68 

.76 

.69 

.46 

.46  .79  .61  .75  1.00 

(e)  Correlation  Matrix  froa  Davie  (1944). 

L  .00 


,72 

1.00 

41 

.34 

1.00 

28 

.36 

.16 

1.00 

52 

.53 

.34 

.30 

1.00 

71 

.71 

.43 

.36 

.64  1.00 

68 

.68 

.42 

.35 

.55  .76  1.00 

51 

.52 

.28 

.29 

.45  .57  .59  1.00 

68 

.68 

.41 

.36 

.55  .76  .68  .58  1.00 

N  -  2620 


N  -  421 
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Table  3  Analysis  of  Davis'  Data:  Initial  Factor  Loadings  and  Error  Variances 


BLR 


1  2 

Error  Varince 

1 

0.812843  -0.114657 

0.331688 

2 

0.816658  -0.014656 

0.338511 

3 

0.477504  -0.077121 

0.767954 

4 

0.453904  0.741009 

0.246798 

5 

0.676648  -0.005734 

0.546000 

6 

0.896015  -0.056022 

0.200804 

7 

0.842199  -0.039545 

0.295139 

8 

0.660842  -0.005573 

0.566964 

9 

0.841190  -0.025113 

0.297765 

BLM 


1 

2 

Error  Varince 

1 

0.855872 

-0.440427 

0.073185 

2 

0.810301 

-0.057998 

0.339893 

3 

0.476637 

0.000079 

0.772769 

4 

0.405390 

0.146887 

0.814058 

5 

0.672354 

0.128670 

0.531306 

6 

0.893300 

0.124247 

0.186432 

7 

0.834940 

0.078102 

0.296642 

8 

0.654167 

0.112733 

0.559281 

9 

0.833737 

0.076118 

0.298954 

BUR 


1  2 

Error  Varince 

1 

0.810807  -0.122295 

0.330455 

2 

0.813890  -0.010746 

0.339987 

3 

0.476344  -0.087920 

0.755056 

4 

0.439384  0.620345 

0.422007 

5 

0.674253  0.008224 

0.541602 

6 

0.889749  -0.043316 

0.212918 

7 

0.838562  -0.03097 6 

0.299621 

8 

0.659494  0.012749 

0.560605 

9 

0.837584  -0.013738 

0.301959 
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Table  6  (continued) 


(b)  Factor  Loadings  and  Factor  Scoring  Weights 
Values  at  the  End  of  Marginal  Estimation  of  Error  Variance. 

Factor  Loadings 


1 

2 

3 

4 

1 

0.584022 

-0.337871 

0.044703 

0.088162 

2 

0.541687 

-0.254189 

-0.023022 

0.048515 

3 

0.536513 

-0.324311 

-0.032304 

0.060920 

4 

0.005327 

-0.712261 

0.005046 

-0.000537 

5 

-0.024393 

-0.734924 

-0.021617 

0.002592 

6 

-0.018424 

-0.730419 

-0.016213 

0.001663 

7 

0.085915 

-0.651269 

0.075602 

-0.008608 

8 

-0.049310 

-0.753803 

-0.043608 

0.005107 

9 

0.000102 

-0.190575 

0.598915 

0.132620 

10 

0.060305 

-0.291032 

0.599291 

0.170080 

11 

0.158430 

-0.185615 

0.561312 

0.043399 

12 

0.395364 

-0.321665 

0.524598 

0.007841 

13 

-0.012577 

-0.220312 

0.118324 

0.511894 

14 

0.010144 

-0.169255 

0.105527 

0.50811 6 

15 

0.061011 

-0.227214 

0.177197 

0.504992 

16 

0.206767 

-0.183465 

0.260190 

0.488820 

Sample  Mean  and  Dispersion  of  Factor  Loadings 
12  3  4 


0.158805  -0.393011  0.183371  0.160349 


1  0.047588 

2  0.017093 

3  -0.003962 

4  -0.009201 


0.017093  -0.003962  -0.009201 
0.050459  0.029291  0.028556 
0.029291  0.056644  0.004383 
0.028556  0.004383  0.041575 
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Table  6  (continued) 
(a)  (continued) 


Factor  Scoring  Weight 

12  3  4 

1  0.394413  -0.030042  -0.079415  0.0171 96 

2  0.281880  -0.013292  -0.077284  0.001809 

3  0.300054  -0.029260  -0.093438  0.009034 

4  -0.068900  -0.220555  -0.044782  -0.049292 

5  -0.101037  -0.249806  -0.067369  -0.042865 

6  -0.098456  -0.254434  -0.065290  -0.046505 

7  -0.000655  -0.150542  0.001760  -0.059018 

8  -0.133333  -0.280495  -0.089655  -0.037181 

9  -0.090953  -0.007862  0.321248  -0.005793 

10  -0.060997  -0.026252  0.31271 6  0.016743 

11  0.028308  -0.000603  0.294939  -0.078495 

12  0.225164  -0.021997  0.293564  -0.143138 

13  -0.050183  -0.016455  -0.021071  0.291471 

14  -0.035100  -0.006594  -0.027347  0.303153 

15  -0.018764  -0.012775  -0.000553  0.315139 

16  0.064876  0.009493  0.031664  0.288458 

Saaple  Dispersion  of  Factor  Scores 

12  3  4 

1  0.655222  -0.073111  0.088669  0.023238 

2  -0.073111  1.085088  -0.083745  -0.105452 

3  0.088669  -0.083745  0.777002  0.089481 

4  0.023238  -0.105452  0.089481  0.597317 
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Table  6  Analysis  of  Harman's  Data 

(a)  Factor  Loadings  and  Factor  Scoring  Weights 
Values  at  the  End  of  Estimation  of  Hyperparameters . 

Factor  Loadings 


1  0.583695  -0.337757  0.044130  0.087959 

2  0.541807  -0.254016  -0.022888  0.048518 

3  0.536692  -0.324043  -0.032077  0.060933 

4  0.005473  -0.712123  0.005168  -0.000551 

5  -0.023985  -0.734585  -0.021257  0.002549 

6  -0.018178  -0.730202  -0.015998  0.001643 

7  0.085502  -0.651552  0.075240  -0.008566 

8  -0.048773  -0.753366  -0.043133  0.005051 

9  0.001400  -0.190835  0.598659  0.132125 

10  0.061218  -0.291066  0.599099  0.169649 

11  0.158644  -0.185795  0.561296  0.043424 

12  0.394458  -0.321751  0.524847  0.008443 

13  -0.011712  -0.220081  0.118846  0.511821 

14  0.010753  -0.169247  0.105990  0.508071 

15  0.061607  -0.227154  0.177626  0.504945 

16  0.206248  -0.183886  0.260088  0.488898 

Sample  Mean  and  Dispersion  of  Factor  Loadings 
12  3  4 

0.159053  -0.392966  0.183477  0.160307 

1  0.047453  0.017109  -0.003968  -0.009156 

2  0.017109  0.050408  0.029247  0.028539 

3  -0.003968  0.029247  0.056595  0.004396 

4  -0.009156  0.028539  0.004396  0.041564 


Table  5  (continued) 


(g)  (continued) 


Group  4:  Variables  3  and  6. 


^  -  [  .0.  .0.  .0.  665442  J. 

CA4 

12  3  4 

1  0.091414  -0.056729  -0.089281  -0.002489 

2  -0.056729  0.038457  0.054028  -0.001978 

3  -0.089281  0.054028  0.087783  0.003923 

4  -0.002489  -0.001978  0.003923  0.003883 


Hyperparaaieters  for  the  Error  Variances 


n  -  16.058422  s  -  3.429397 
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Table  5  (continued) 


(g)  Hype  rpa ram  ter  a 

With  Locally  Exchangeable  Prior  with  A  Priori  Zero* 
Hyperparamter*  for  the  Factor  Loading* 


Group  1:  Variables  1,  3  and  4. 


ax  -  l  .724779,  .0.  .0.  .0  ] , 


12  3  4 

1  0.004783  0.002168  0.002714  0.001054 

2  0.002168  0.179911  0.113805  -0.066294 

3  0.002714  0.113805  0.072865  -0.043123 

4  0.001054  -0.066294  -0.043123  0.031040 


Group  2:  Variable*  7.  9  and  10. 


.0.  -.716819.  .0,  .0  ]. 


12  3  4 

1  0.172095  0.000920  -0.102238  0.060819 

2  0.000920  0.002812  -0.004285  0.000267 

3  -0.102238  -0.004285  0.070773  -0.037518 

4  0.060819  0.000267  -0.037518  0.021920 


Group  3:  Variable*  2  and  8. 


a3  -  t  .0.  .0.  -.664751.  .0  ], 


A3 


1 


2  3 


4 


1  0.165055  -0.141059  0.001925 

2  -0.141059  0.120626  -0.001293 

3  0.001925  -0.001293  0.001697 

4  0.101284  -0.086814  -0.000033 


0.101284 

-0.086814 

-0.000033 

0.063032 
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Table  5  (continued) 


(•)  Factor  Loadings  (  Varimax  Rotation  ) 
KLE  by  SAS  PRDC  FACTOR  with  SMC  Option 


1 

2 

3 

4 

Error  Variance 

1 

0.52648 

0.58392 

0.21769 

0.36209 

.20338 

1 

0.37697 

0.35478 

0.38362 

0.62558 

.19332 

3 

0.42621 

0.77377 

0.28933 

0.24984 

.07349 

4 

0.39202 

0.61580 

0.36376 

0.28062 

.25605 

•  *  * 

5 

0.13159 

0.22804 

0.73929 

0.27133 

.30485 

6 

0.23203 

0.17051 

0.70087 

0.17277 

.39602 

. 

7 

0.81020 

0.29513 

0.22333 

0.16403 

.17969 

8 

0.29934 

0.26396 

0.38636 

0.69531 

.20799 

9 

0.67360 

0.27751 

0.25067 

0.38481 

.25834 

10 

0.6°873 

0.41570 

0.24666 

0.28149 

.19890 

(f)  Factor  Loading a 

(  Oblique  Rotation  fron  See,  el.al  (1981)) 


1 

2 

3 

4 

1 

.54 

.27 

.26 

-.04 

*  * 

2 

.21 

.15 

.59 

.14 

*■ 

3 

.70 

.16 

.13 

.08 

4 

.62 

.12 

.15 

.57 

5 

.13 

.08 

.19 

.57 

.  * 

6 

.07 

.20 

.10 

.56 

7 

.23 

.68 

.04 

.01 

*. 

8 

.10 

.12 

.62 

.17 

9 

.13 

.58 

.29 

.00 

•*  4 

10 

.33 

.56 

.14 

.02 

JLjL 
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Table  3  (continued) 

(c)  Factor  Loadings  (  Varivax  Rotation  ) 
Vith  Locally  Exchangeable  Prior 


1 

2 

3 

4 

1 

0.5440 

0.2128 

-0.3580 

-0.5906 

2 

0.3905 

0.3750 

-0.6304 

-0.3650 

3 

0.4487 

0.2879 

-0.2563 

-0.7573 

4 

0.4006 

0.3459 

-0.2893 

-0.6332 

5 

0.1639 

0.6747 

-0.3099 

-0.2516 

6 

0.2259 

0.7870 

-0.1666 

-0.1626 

7 

0.8133 

0.2143 

-0.1720 

-0.2960 

8 

0.3180 

0.3782 

-0.7160 

-0.2700 

9 

0.6899 

0.2480 

-0.3850 

-0.2776 

10 

0.7143 

0.2449 

-0.2797 

-0.4131 

(d)  Factor  Loadings  (  Varivax  Rotation  ) 
Vith  Globally  Exchangeable  Prior 


1 

2 

3 

4 

1 

0.5311 

0.2211 

-0.3624 

-0.5924 

2 

0.3720 

0.3724 

-0.6205 

-0.3583 

3 

0.4328 

0.2894 

-0.2534 

-0.7586 

4 

0.3913 

0.3621 

-0.2788 

-0.6252 

5 

0.1521 

0.7155 

-0.2794 

-0.2350 

6 

0.2347 

0.7190 

-0.1691 

-0.1677 

7 

0.8105 

0.2242 

-0.1670 

-0.2968 

8 

0.3059 

0.3921 

-0.6988 

-0.2665 

9 

0.6746 

0.2508 

-0.3850 

-0.2788 

10 

0.6936 

0.2382 

-0.2799 

-0.4174 

Table  S  Analysis  of  ASVAB  Form  8A 

(a)  Factor  Loadings  (  without  Rotation  ) 

With  Locally  Exchangeable  Prior  with  A  Priori  Zeros 


1 

2 

3 

4 

0.6581 

-0.5029 

-0.3158 

0.0948 

0.4516 

-0.3784 

-0.6236 

0.2511 

0.8200 

-0.3936 

-0.2161 

0.1678 

0.6998 

-0.3598 

-0.2662 

0.2375 

0.3398 

-0.1627 

-0.3523 

0.6040 

0.2594 

-0.2249 

-0.2263 

0.7275 

0.3784 

-0.7902 

-0.1407 

0.1401 

0.3574 

-0.3147 

-0.7048 

0.2514 

0.3641 

-0.6724 

-0.3596 

0.1529 

0.4950 

-0.6877 

-0.2474 

0.1507 

(b)  Factor  Loadings  (  Variaax  Rotation  ) 


With  Locally  Exchangeable  Prior  with  A  Priori  Zeros 


1 

2 

3 

4 

0.5852 

-0.5346 

-0.3532 

0.2043 

0.3562 

-0.3805 

-0.6258 

0.3684 

0.7532 

-0.4394 

-0.2514 

0.2802 

0.6271 

-0.3936 

-0.2857 

0.3415 

0.2457 

-0.1595 

-0.3040 

0.6727 

0.1576 

-0.2175 

-0.1627 

0.7744 

0.2928 

-0.8062 

-0.1682 

0.2076 

0.2627 

-0.3073 

-0.6992 

0.3665 

0.2728 

-0.6794 

-0.3793 

0.2399 

0.4085 

-0.7083 

-0.2759 

0.2393 
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1 

1  0.937479 

2  0.003313 


1 


Table  4  (continued) 

(d)  Final  Value  of  F*'F*. 

Solution  with  Low  Error  Varinace  on  Variable  #4. 

2 

0.005313 

0.607890 

Solution  with  High  Error  Varinace  on  Variable  #4. 
2 


1  0.940400  -0.019373 

2  -0.019373  0.364831 
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Table  4  (continued) 


(o)  Final  Conditional  Expectation  of  Factor  Score  Weights. 


Solution  with  Low  Error  Variance  on  Variable  #4. 
1  2 

1  0.155969  -0.153019 

2  0.151353  -0.027197 

3  0.040342  -0.046109 

4  0.075110  0.833996 

5  0.078564  -0.006782 

6  0.266449  -0.116043 

7  0.177608  -0.062457 

8  0.074163  -0.005465 

9  0.175636  -0.041495 


Solution  with  High  Error  Varinaee  on  Variable  #4. 
1  2 


1 

0.185876 

-0.762509 

2 

0.144439 

-0.246647 

3 

0.037639 

0.009005 

4 

0.034176 

0.118204 

5 

0.083909 

0.198785 

6 

0.276851 

0.363198 

7 

0.170171 

0.126135 

8 

0.075363 

0.136785 

9 

0.168515 

0.121119 
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Table  4  Analysis  of  Davis'  Data:  Results 
(a)  Final  Conditional  Expectation  of  Error  Varinaces 


Solution  with  Low  Error  Varinaee  on  Variable  #4. 

0.328924  0.336681  0.749110  0.322409  0.536633  0.209846 

0.296318  0.555788  0.298808 

Solution  with  High  Error  Varinaee  on  Variable  #4. 

0.232446  0.316048  0.754362  0.793634  0.513588  0.198474 

0.297366  0.546577  0.299606 


(b)  Final  Conditional  Expectation  of  Factor  Loadings. 


Solution  with  Low  Error  Varinaee  on  Variable  #4. 
1  2 

1  0.810576  -0.117377 

2  0.813995  -0.012322 

3  0.476424  -0.081633 

4  0.446113  0.691789 

5  0.674320  -0.000144 

6  0.890054  -0.050043 

7  0.838720  -0.035834 

8  0.659376  0.001189 

9  0.837695  -0.020270 


Solution  with  High  Error  Varinaee  on  Variable  #4. 
1  2 

1  0.823812  -0.304174 

2  0.813901  -0.147551 

3  0.477665  -0.003874 

4  0.411161  0.135154 

5  0.677542  0.140070 

6  0.893923  0.086225 

7  0.838166  0.033489 

8  0.659422  0.097594 

9  0.836848  0.031608 


Table  3  (continued) 


SX#7 


1 

2 

Error  Varince 

1 

0.680000 

0.443260 

0.341120 

2 

0.680000 

0.447630 

0.337230 

3 

0.420000 

0.222940 

0.773900 

4 

0.350000 

0.212910 

0.832170 

5 

0.550000 

0.393430 

0.542710 

6 

0.760000 

0.464740 

0.206240 

7 

1.000000 

0.000000 

0.000000 

8 

0.590000 

0.287690 

0.569130 

9 

0.680000 

0.507700 

0.279840 

SX#9 


1 

2 

Error  Varince 

1 

0.680000 

0.442940 

0.341400 

2 

0.680000 

0.446050 

0.338640 

3 

0.410000 

0.243240 

0.772730 

4 

0.360000 

0.192510 

0.889333 

5 

0.550000 

0.392760 

0.543240 

6 

0.760000 

0.464030 

0.207070 

7 

0.680000 

0.510450 

0.277040 

8 

0.580000 

0.307850 

0.568830 

9 

1.000000 

0.000000 

0.000000 

Table  3  (continued) 


SSMC,  SBLM, 
1 


1  1.000000 

2  0.720000 

3  0.410000 

4  0.280000 

5  0.520000 

6  0.710000 

7  0.680000 

8  0.510000 

9  0.680000 


SBLR 

1 


1  0.280000 

2  0.360000 

3  0.160000 

4  1.000000 

5  0.300000 

6  0.360000 

7  0.350000 

8  0.290000 

9  0.360000 


sm 

1 


1  0.720000 

2  1.000000 

3  0.340000 

4  0.360000 

5  0.530000 

6  0.710000 

7  0.680000 

8  0.520000 

9  0.680000 


SBHR,  SBHM,  SBMR,  and  SBMH 


2  Error  Varinoa 


0.000000  0.000000 

0.369970  0.344720 

0.242340  0.773170 

0.326190  0.815200 

0.443850  0.532600 

0.556080  0.186680 

0.491090  0.296430 

0.424890  0.559370 

0.488650  0.298820 


2  Error  Varinoe 


0.766790  0.333630 
0.728960  0.339010 
0.453180  0.769030 
0.000000  0.000000 
0.603130  0.546230 
0.818510  0.200450 
0.763130  0.295130 
0.590430  0.567290 
0.756610  0.297930 


2  Error  Varlnce 


0.365890  0.347720 
0.000000  0.000000 
0.357170  0.756830 
0.195780  0.832070 
0.427760  0.536120 
0.558710  0.183750 
0.491730  0.295800 
0.406140  0.564650 
0.488240  0.299220 
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Table  3  (continued) 


BUM 


1 

2 

Error  Varince 

1 

0.811119 

-0.121935 

0.330069 

2 

0.813922 

-0.009310 

0.339990 

3 

0.476510 

-0.088143 

0.754991 

4 

0.437326 

0.610792 

0.4352 66 

5 

0.674248 

0.010934 

0.541635 

6 

0.889839 

-0.040549 

0.212984 

7 

0.838630 

-0.028670 

0.299661 

8 

0.659503 

0.016008 

0.560582 

9 

0.837628 

-0.011161 

0.301968 

EUR 


1 

2 

Error  Varince 

1 

0.836022 

-0.382014 

0.155784 

2 

0.811075 

-0.103317 

0.333018 

3 

0.476852 

-0.012453 

0.776130 

4 

0.408311 

0.139907 

0.817584 

5 

0.675165 

0.123325 

0.531444 

6 

0.896054 

0.102585 

0.187412 

7 

0.836854 

0.049606 

0.298595 

8 

0.656418 

0.097122 

0.562332 

9 

0.835626 

0.047953 

0.300820 

BUM 


1 

2 

Error  Varince 

1 

0.836910 

-0.380007 

0.155828 

2 

0.811316 

-0.101433 

0.333012 

3 

0.476880 

-0.011333 

0.776130 

4 

0.407981 

0.140862 

0.817585 

5 

0.674874 

0.124915 

0.531442 

6 

0.895811 

0.104686 

0.187411 

7 

0.836735 

0.051562 

0.298595 

8 

0.656188 

0.098659 

0.562332 

9 

0.835511 

0.049907 

0.300820 

Tabic  6  (continued) 


(b)  (continued) 


Factor  Scoring  Weight 

12  3  4 

1  0.396668  -0.030005  -C. 079840  0.017604 

2  0.282561  -0.013289  -0.077726  0.002023 

3  0.300887  -0.029298  -0.094023  0.009303 

4  -0.069127  -0.220696  -0.044808  -0.049597 

5  -0.101663  -0.250306  -0.067666  -0.043101 

6  -0.098912  -0.254818  -0.065462  -0.046813 

7  -0.000359  -0.150332  0.002056  -0.059417 

8  -0.134299  -0.281308  -0.090174  -0.037372 

9  -0.092510  -0.007878  0.322904  -0.005700 

10  -0.062101  -0.026296  0.313909  0.016912 

11  0.028051  -0.000590  0.296059  -0.079150 

12  0.226944  -0.022005  0.294864  -0.144876 

13  -0.050628  -0.016448  -0.021576  0.292930 

14  -0.035381  -0.006523  -0.027898  0.304816 

15  -0.019035  -0.012715  -0.001049  0.316910 

16  0.065508  0.009780  0.031427  0.289864 

Saaple  Dispersion  of  Factor  Scores 

12  3  4 

1  0.661182  -0.072420  0.088229  0.022859 

2  -0.072420  1.087983  -0.083686  -0.105571 

3  0.088229  -0.083686  0.783112  0.088769 

4  0.022859  -0.105571  0.088769  0.603667 
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Table  6  (continued) 

(c)  Factor  Loadings  and  Factor  Scoring  Weighta 
Values  at  the  End  of  Joint  Estimation  of  Factor  Scorea  and  Factor  Loadings. 

Factor  Loadings 

12  3  4 


1  0.612345  -0.599289  0.087146  0.158986 

2  0.564085  -0.457232  0.010543  0.103659 

3  0.558300  -0.540573  0.000103  0.118598 

4  -0.114895  -0.803456  -0.100861  0.011532 

5  -0.147203  -0.828065  -0.129800  0.014875 

6  -0.137196  -0.820510  -0.12079 6  0.013547 

7  -0.005696  -0.720830  -0.004974  0.000636 

8  -0.175978  -0.849867  -0.155236  0.017726 

9  -0.331131  -0.159671  0.672917  0.292350 

10  -0.168156  -0.314258  0.661643  0.322302 

11  -0.018781  -0.272320  0.616332  0.201005 

12  0.429215  -0.554887  0.551572  0.152400 

13  -0.184234  -0.112750  -0.071843  0.525962 

14  -0.158000  -0.059276  -0.083435  0.52177 6 

15  -0.103943  -0.162369  0.016787  0.519876 


16  0.129169  -0.209889  0.217741  0.498008 


Sample  Mean  and  Dispersion  of  Factor  Loadings 
12  3  4 

0.045494  -0.466578  0.135490  0.217077 

1  0.092029  -0.012986  -0.002124  -0.013083 

2  -0.012986  0.076038  0.034554  0.049943 

3  -0.002124  0.034554  0.088389  0.011916 

4  -0.013083  0.049943  0.011916  0.038570 
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Table  6  (continued) 
(e)  (continued) 


Factor  Scoring  Weight 

12  3  4 

1  0.321475  -0.076134  -0.011602  0.073320 

2  0.221562  -0.041760  -0.028399  0.043020 

3  0.237909  -0.056945  -0.037947  0.051954 

4  -0.101998  -0.180280  -0.062740  -0.053232 

5  -0.132437  -0.200977  -0.086976  -0.050169 

6  -0.129012  -0.204547  -0.083143  -0.054189 

7  -0.024690  -0.126345  -0.002552  -0.059505 

8  -0.164554  -0.223292  -0.112158  -0.047452 

9  -0.190523  -0.013054  0.289455  0.047816 

10  -0.123135  -0.033973  0.281129  0.066094 

11  -0.032573  -0.030558  0.267324  -0.004878 

12  0.229642  -0.083671  0.274954  -0.040951 

13  -0.059283  0.014240  -0.105013  0.288855 

14  -0.049334  0.022418  -0.113743  0.303809 

15  -0.035622  0.010478  -0.082566  0.308744 

16  0.061710  0.007897  0.001744  0.263956 

Sanple  Dispersion  of  Factor  Scores 

12  3  4 

1  0.526156  0.118747  0.035452  -0.005656 

2  0.118747  0.841797  -0.030994  -0.197943 

3  0.035452  -0.030994  0.620489  0.091274 

4  -0.005656  -0.197943  0.091274  0.707033 
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Table  6  (continued) 


(d)  Factor  Loadings  and  Factor  Scoring  Weights 
Values  at  the  End  of  Marginal  Estimation  of  Factor  Loadings. 


Factor  Loadings 


1  2 


4 


1  0.590076 

2  0.549209 

3  0.543649 

4  -0.079764 

5  -0.106069 

6  -0.097399 

7  0.010937 

8  -0.129254 

9  -0.208041 

10  -0.103776 

11  0.013599 

12  0.332875 

13  -0.124651 

14  -0.101330 

15  -0.062553 

16  0.116852 


-0.561924 

-0.429081 

-0.505775 

-0.776750 

-0.796811 

-0.790265 

-0.708142 

-0.814379 

-0.187953 

-0.318461 

-0.268345 

-0.493583 

-0.136912 

-0.088096 

-0.177377 

-0.210490 


0.052008 

-0.012859 

-0.022812 

-0.069955 

-0.093605 

-0.085782 

0.009638 

-0.114118 

0.647443 

0.641745 

0.607520 

0.564576 

-0.013462 

-0.024507 

0.056320 

0.208957 


0.139681 

0.090081 

0.103710 

0.008106 

0.010848 

0.009661 

-0.000935 

0.013140 

0.244572 

0.278818 

0.180310 

0.161941 

0.520344 

0.516577 

0.515738 

0.498821 


Sample  Mean  and  Dispersion  of  Factor  Loadinga 


12  3  4 

0.071523  -0.454021  0.146944  0.205713 


1  0.069742  -0.005311 

2  -0.005311  0.064975 

3  -0.006141  0.032524 

4  -0.010388  0.044741 


-0.006141 

0.032524 

0.078656 

0.010502 


-0.010388 

0.044741 

0.010502 

0.038153 


•7*1 

-1 

•  > 
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Table  6  (continued) 
(d)  (continued) 


Factor  Scoring  Weight 

12  3  4 

1  0.310975  -0.071712  -0.025250  0.068637 

2  0.216437  -0.039193  -0.034988  0.040520 

3  0.232531  -0.053251  -0.044705  0.048807 

4  -0.079177  -0.173407  -0.043572  -0.060132 

5  -0.103658  -0.192315  -0.062792  -0.058892 

6  -0.100379  -0.195927  -0.059090  -0.062856 

7  -0.016103  -0.123747  0.004636  -0.062054 

8  -0.129295  -0.212685  -0.082522  -0.058155 

9  -0.128161  -0.018218  0.282117  0.020503 

10  -0.078863  -0.035845  0.276962  0.040581 

11  -0.016123  -0.030623  0.266015  -0.016500 

12  0.171872  -0.07354 6  0.284514  -0.033907 

13  -0.038000  0.012078  -0.083439  0.276675 

14  -0.028535  0.019496  -0.091102  0.290919 

15  -0.018683  0.009014  -0.065872  0.299253 

16  0.056413  0.007544  -0.001865  0.265621 

Saaple  Dispersion  of  Factor  Scores 

12  3  4 

1  0.526156  0.118767  0.035453  -0.005663 

2  0.118767  0.841779  -0.030984  -0.197950 

3  0.035453  -0.030984  0.620473  0.091274 

4  -0.005663  -0.197950  0.091274  0.707048 
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Table  6  (continued) 


(e)  Factor  Loading*  and  Factor  Scoring  Weights 
Valnes  at  the  End  of  Marginal  Estimation  of  Factor  Scores. 

Factor  Loadings 

12  3  4 

1  0.612883  -0.620351  0.087765  0.163808 

2  0.564815  -0.472804  0.011541  0.107391 

3  0.559069  -0.557269  0.001153  0.122592 

4  -0.120888  -0.808003  -0.106144  0.012138 

5  -0.153182  -0.832600  -0.135066  0.01547 6 

6  -0.143460  -0.8252 60  -0.126312  0.014180 

7  -0.012304  -0.725842  -0.010794  0.001311 

8  -0.181844  -0.854317  -0.160402  0.018312 

9  -0.342213  -0.162270  0.675906  0.300225 

10  -0.191989  -0.322731  0.663786  0.330311 

11  -0.024473  -0.283294  0.619279  0.211781 

12  0.436680  -0.578528  0.553292  0.164836 

13  -0.194428  -0.104943  -0.083961  0.526721 

14  -0.169842  -0.050546  -0.097312  0.522669 

15  -0.114489  -0.158021  0.006412  0.520785 

16  0.122458  -0.214506  0.215417  0.498839 


Sample  Mean  and  Dispersion  of  Factor  Loadings 


12  3  4 

0.040425  -0.473205  0.132160  0.220711 

1  0.094559  -0.015765  -0.001086  -0.013216 

2  -0.015765  0.078125  0.033405  0.051106 

3  -0.001086  0.033405  0.090609  0.012573 

4  -0.013216  0.051106  0.012573  0.038553 


i 
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Table  6  (continued) 
(e)  (continued) 


Factor  Scoring  Weight 

12  3  4 

1  0.310729  -0.076873  -0.014801  0.077513 

2  0.214367  -0.042087  -0.030034  0.045990 

3  0.230309  -0.057105  -0.039526  0.054979 

4  -0.101932  -0.176757  -0.062581  -0.055287 

5  -0.131321  -0.196949  -0.086196  -0.052904 

6  -0.128394  -0.200537  -0.082699  -0.056773 

7  -0.027509  -0.124206  -0.004153  -0.059769 

8  -0.162279  -0.218703  -0.110677  -0.050922 

9  -0.190825  -0.012891  0.286579  0.047963 

10  -0.122128  -0.033905  0.277122  0.066889 

11  -0.036273  -0.031106  0.262660  -0.000217 

12  0.223221  -0.085000  0.266164  -0.030328 

13  -0.056747  0.016223  -0.107366  0.286299 

14  -0.047645  0.024393  -0.116936  0.301758 

15  -0.034660  0.012265  -0.085416  0.306006 

16  0.059241  0.008256  -0.000125  0.261205 

Saaple  Dispersion  of  Factor  Scores 

12  3  4 

1  0.502935  0.126244  0.025704  -0.003952 

2  0.126244  0.813467  -0.021044  -0.192335 

3  0.025704  -0.021044  0.597943  0.087372 

4  -0.003952  -0.192335  0.087372  0.705390 
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Table  6  (continued) 

(f)  Factor  Loadings  and  Factor  Scoring  Weights 
Error  Variances  at  the  End  of  Ryperparaaeter  Estimation. 

0.463089  0.625698  0.573485  0.383050  0.355957  0.346092 

0.481711  0.330248  0.491035  0.475497  0.493230  0.411494 

0.644330  0.620561  0.565170  0.562536 


MEAN  AM)  VARIANCE  OF  ERROR  VARIANCES  -  0.488949  0.009997 


(g)  Factor  Loadings  and  Factor  Scoring  Weights 
Error  Variances  at  the  End  of  Marginal  Estimate  of  Error  Variance. 

0.455959  0.617735  0.565945  0.378151  0.351078  0.341424 
0.476000  0.325488  0.483854  0.468998  0.486393  0.405130 
0.635949  0.612180  0.557435  0.554944 


MEAN  AM)  VARIANCE  OF  ERROR  VARIANCES  -  0.482291  0.009758 


Tabid  6  (continued) 


(h)  Factor  Loadings  and  Factor  Scoring  Weights 
Hyperparaae  te r  s 


Hype rpa roasters  of  the  Error  Variances 

n  -  35.509313  s  -  16.601690 


Hype rparaae ter s  of  the  Factor  Loadings 


Group  1:  Variables  it  1,  2,  and  3. 


n  «  [  .554064,  .0  .0  .0  ] 


CA1 

1  2 

1  0.000901  -0.000338 

2  -0.00033 8  0.097439 

3  0.001448  0.000604 

4  0.000530  -0.021174 


3  4 

0.001448  0.000530 
0.000604  -0.021174 
0.002367  0.000611 
0.000611  0.004843 


Group  2:  Variables  it  4,  5,  6,  7  and  9. 
a2  «  [  .0.  -.716366,  .0,  .0  ] 

CA2 

12  3  4 

1  0.002903  0.002197  0.002550  -0.000292 

2  0.002197  0.001671  0.001935  -0.000221 

3  0.002550  0.001935  0.002258  -0.000258 

4  -0.000292  -0.000221  -0.000258  0.000039 


Tabic  6  (continued) 


(h)  (continued) 


Group  3:  Variables  ft  9,  10,  11,  and  12. 
a3  -  I  .0,  .0,  .570975,  .0  ] 

CA3 

12  3  4 

1  0.048505  -0.044281  -0.004950  0.004370 

2  -0.044281  0.066629  0.000686  -0.022250 

3  -0.004950  0.000686  0.001085  0.002238 

4  0.004370  -0.022250  0.002238  0.013150 


Group  4:  Variables  #  13,  14,  15,  and  16. 

a.  -  [  .0.  .0,  .0.  .503434  ] 

-4 

CA4 

12  3  4 

1  0.013770  -0.012277  0.017359  -0.000967 

2  -0.012277  0.043087  -0.034108  -0.000208 

3  0.0173S9  -0.034108  0.032701  -0.000599 

4  -0.000967  -0.000208  -0.000599  0.000115 
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Figure.  1  Various  Estimates  of  Error  Variances  of  the  Henan  Data 


N  =  145 


D.F. 

LAMBDA 

MEAN 

VARIANCE 

\.* 

i: 

M.L.E.  BY  SAS  PMC  FACTOR 

0,477034 

0.022715 

V* 

2: 
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Error  Variance 


3.  4*  5*  6.  7.  8* 


Fisure.  2  Various  Estiaates  of  Error  Variances  of  the  Francis  Data 


8  *  50 


I!  N.L.E.  BY  SAS  PROC  FACTOR 
2*.  NAR61NAL  ESTIMATE  OF  E-VAR  BY  E-AF 
3»  MARGINAL  ESTIMATE  OF  E-VAR  BY  E-FA 
4!  LOADINGS  BY  FORMULA*  E-VAR  BY  LOG-NORMAL 
Si  LOADINGS  BY  E-AF.  ERR-VAR  BY  LOG-NORMAL 

6!  LOADINGS  BY  E-fA.  ERR-VAR  BY  LOG-MMAL 

7*  LOADINGS  BY  E-AF.  ERR-VAR  BY  EM  AFTER  E-AF 

8!  LOADINGS  BY  E-FA.  ERR-VAR  BY  EM  AFTER  E-fA 

9i  LOADINGS  BY  E-AF.  ERR-VAR  BY  EM  AFTER  E-F 
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delete  Factor  Loadings  and  Factor  Scoring  Weights 
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Bayesian  Factor  Analysis* 

Shin-ichi  Mayekawa 
The  University  of  Iowa 
Abstract 

A  new  Bayesian  procedure  for  factor  analysis  is  developed  in  which 
factor  scores  as  well  as  factor  loadings  and  error  variances  are  treated 
as  parameters  of  interest.  The  presentation  is  fully  Bayesian  in  the 
sense  that  all  the  parameters  have  prior  distributions  and  the  posterior 
mode  of  a  subset  of  the  parameters  is  used  as  the  point  estimate. 

The  model  is  a  standard  one  where  the  observations  are  expresssed 
as  the  sum  of  the  linear  combination  of  factor  scores,  with  factor 
loadings  being  the  weights,  and  a  normal  error  term.  As  the  prior  distri¬ 
bution  the  following  exchangeable  form  is  assumed: 

A  factor  score  vector  for  each  observation  has  a  common  normal 
distribution. 

A  factor  loading  vector  for  each  variable  has  a  common  normal 
distribution. 

A  error  variance  for  each  variable  has  a  common  inverted  chi 
square  distribution. 

When  the  exchangeability  of  all  the  observations/variables  is  in  question 
observations/variables  may  be  divided  into  several  subsets  and  the 
observations/ variables  within  each  subset  may  be  treated  as  exchangeable. 

Since  the  posterior  marginal  distribution  of  factor  loadings  and 
error  variances  can  be  expressed  as  the  product  of  the  covariance-based 
likelikhood  and  the  prior  distributions  of  factor  loadings  and  error 
variances  the  proposed  method  includes  both  the  random  and  the  fixed 
factor  analysis  models. 

The  mode  of  the  hyperparameters  is  first  derived  from  their  posterior 
marginal  distributions  and  conditional  on  those  values  the  mode  of  error 
variance  is  derived  from  their  posterior  marginal  distributions.  Then, 
conditional  of  those  estimates,  the  point  estimate  of  factor  scores 
and  factor  loadings  are  derived  as  the  joint  or  the  marginal  mode  of  the 
posterior  distribution  of  factor  scores  and  factor  loadings  depending  on 
the  investigator's  interest. 

The  marginalization  is  done  via  some  variations  of  the  EM  algorithm 
and  it  is  found  that  the  different  variations  result  in  almost  identical 
estimates.  It  is  also  found  that  the  effect  of  the  prior  distribution  of 
error  variances  is  such  that  it  reduces  the  number  of  local  maxima. 
Finally,  by  specifying  a  priori  zeros  in  the  locational  hyperparameters 
of  factor  loadings,  a  simple  structure  can  be  obtained  without  rotation. 


*Support  for  this  research  was  provided  under  contract  // N00014-83- 
C-0514  with  the  Personnel  Training  Branch  of  the  Office  of  Naval  Research, 
Melvin  R.  Novick,  Principal  Investigator.  I  am  indebted  to  Professor 
Novick  and  Dr.  Ming-mei  Wang  for  their  comments  on  earlier  drafts.  Also, 

I  would  like  to  thank  Professor  Tom  Leonard  of  the  University  of  Wisconsin. 
Some  of  the  methods  used  in  chapter  IV  were  originally  proposed  by  him. 
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